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lecture  outline 

THE  ROLE  OF  STATISTICS  IN 
AUTOMATED  INSPECTION  AND  CLASSIFICATION 

FOR  PROCESS  CONTROL 

Stuart  Geman,  Brown  University 
I.  Introduction 

A.  Systematic  improvements  in  computing  hard¬ 
ware  have  sustained  hopes  for  “intelligent” 
computers. 

B.  In  many  application  areas,  such  as  speech 
processing,  filtering,  and  vision,  algorithm 
and  software  development  have  not  kept  pace 
with  hardware  improvement.  Partly,  this  is 
because  the  appropriate  scientific  tools  have 
not  been  utilized. 

C.  In  many  (most?)  cases,  the  mathematical  sci¬ 
ences ,  especially  probability  and  statistics  of¬ 
fer  the  right  tools  for  constructing  suitable  al¬ 
gorithms.  In  this  regard,  the  utility  of  “sym¬ 
bolic  processing”  and  other  “AI”  tools  such  as 
frames  and  schemas  have  been  overestimated 
and  oversold. 
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D.  Encouraging  a  role  for  the  mathematical  sci¬ 
ences: 

1.  Traditional  consulting  arrangements  en¬ 
courage  shallow  scientific  involvement  by 
mathematicians . 

2.  Probabilists,  statisticians,  and  other 
mathematical  scientists  can  best  con¬ 
tribute  by  joining  and  initiating  research 
efforts  in  high  technology  areas,  such  as 
speech,  filtering,  and  vision. 

3.  “Neural  networks”  are  parallel  process¬ 
ing  systems  for  statistical  inference.  This 
well-funded  field  could  catalyze  involve¬ 
ment  of  mathematicians,  physicists,  and 
other  scientists  not  traditionally  associ¬ 
ated  with  algorithm  development  for  “in¬ 
telligent”  processing. 

4.  Affiliate  programs  and  entrepreneurial  ar¬ 
rangements  could  encourage  mathemati¬ 
cians  to  take  leadership  roles  in  technol¬ 
ogy  research. 


II.  Computing  horsepower  -  an  illustrative  example 

A.  Parallel  processing  will  play  an  increasingly 
important  role  in  weapon  systems  and  indus¬ 
trial  automation. 

B.  In  the  next  few  years,  the  most  useful  parallel 
machines  will  likely  have  modest  numbers  of 
relatively  powerful  processors. 

C.  One  example  is  the  MTAP  system,  based  on 
VHSIC  technology,  being  developed  for  the 
Army. 

D.  Other  examples  are  the  multi-DSP  processor 
systems  now  coming  on  the  market. 

1.  One  example  has  4  to  12  DSP  proces¬ 
sors  with  extensive  local  memory  and  very 
high  bandwidth  communication. 

2.  This  system  costs  between  $15,000  (for 
4  processors)  and  $30,000  (for  12  proces¬ 
sors). 

3.  Each  processor  has  a  10  (soon  to  be  16) 
megahertz  clock,  and  performs  two  mem¬ 
ory  operations,  one  arithmetic  operation, 
and  one  accumulate  per  clock  cycle. 

4.  Future  versions  will  offer  more  processors. 

-slides  of  architecture- 
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TIL  Opportunities  for  industry  and  mathematical 
sciences 

A.  Speech  recognition 

1.  Most  researchers  acknowledge  the  IBM 
framework  to  be  the  most  advanced. 

2.  Beautiful  mathematical  structure,  accom¬ 
modating  vertical  processing  (interpreta¬ 
tion  guided  segmentation),  time  warping,  j 

i 

•  •  • 

3.  Two  severe  weaknesses:  top  level  (lan¬ 
guage)  model,  bottom  level  (signal) 
model. 

4.  Signal  model:  clustered  feature  vector 
from  FFT  or  LPC  (ARMA  model). 

a.  Much  more  sophisticated  and  powerful 
tools  exist  in  the  statistics  collection. 

b.  Modern  theories  of  time  series,  Markov 
models,  ...  should  be  harnessed. 

c.  These  are  not  off-the-shelf  methods; 
mathematicians  should  be  involved  in 
implementation. 
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B.  Filtering 

1.  Sample  problem: 

•  xt ,  t  =  0, 1,  ...T  :  “state”  such  as  a  cali¬ 
brated  setting  of  a  stage  handler  . 

•  yt ?  t  =  0, 1,  ...T  :  noisy  observation  of  or* 

•  Given  ?/t>  t  =  0, 1,  ...T,  estimate  xt,  t  = 

0,1 

2.  Standard  solution:  Kalman  filter  with  es¬ 
timation  of  state  parameters. 

3.  Existing  hardware  permits  exploitation  of 
more  general  framework. 

4.  Especially:  should  construct  more  realis¬ 
tic  state  models. 
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5.  Example: 

a.  Often  have  a  priori  knowledge  about 
qualitative  behavior  of  state  process, 
e.g.- 

i.  There  exist  occassional  jumps. 

ii.  Between  jumps,  movement  resem¬ 
bles  random  walk. 

b.  Suggest:  build  Markov  model  of  state 
using  Gibbs  representation  of  random 
fields: 

P(xt  :  0  <  t  <  T)  = 


i  T 

(—)exp{-X^2<f>(xt  ~xt- 1)} 


t  =  l 


i.  (j)  “engineered”  to  capture  a  priori 
knowledge. 

ii.  A  estimated  from  data. 

iii.  Computationally  feasible! 


C.  Vision 

1.  Example:  automatic  defect  detec  lion  and 
classification  for  wafer  manufacturing  pro¬ 
cess  control.  Wafer  ID5s  read  automati¬ 
cally  for  defect  cataloguing. 

-slides  of  wafers  8z  wafer  characters- 

2.  State-of-the-art 

a.  Template  matching  for  defect  detection 
(sells) 

b.  Optical  character  recognition: 
matched  filter  (sells) 

c.  Effectiveness 

i.  Both  very  sensitive  to  lighting,  fo¬ 
cus,  and  normal  process  variations 
(such  as  texturing). 

ii.  Not  suitable  for  detailed  multilay¬ 
ered  (end-stage)  inspection. 

-slides  of  textures  &;  texture  histograms- 

3.  Mathematical  technologies  for  inspection, 
classification  and  optical  character  recog¬ 
nition: 

a.  Inspection 

i.  Probabilistic  analysis  of  texture: 
spatial  statistics,  random  fields,  ... 
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-slides  of  textures  8z  texture  segmentations- 

ii.  Statistical  estimation  of  normal 
process  variations. 

iii.  Combinatorial  optimization  of  de¬ 
tection  algorithm. 

iv.  Result:  full  field  of  view  inspection 

,  (512  x  512  pixels);  submicron  defect 

detection  (2x2  pixels);  400  millisec¬ 
onds. 

•  processing  time  independent  of 
complexity 

•  one  8  megahertz  DSP 

•  Algorithm  fully  parallel  (as  are 
most  vision  algorithms) 

b.  Classifier 

i.  Decision  tree/recursive  partition 
classifier  (statisticians  version  of  an 
expert  system). 

ii.  Naturally  accommodates  statistical 
variation. 

c.  Optical  character  recognition 

i.  Templates  — >  Relational  Templates. 

ii.  Optimize  speed  by  exploiting  se¬ 
quential  decision  framework  for 
graph  matching. 
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STATISTICAL  METHODS  FOR  TOMOGRAPHIC  IMAGE  RECONSTRUCTION 


Stuart  Geman  and  Donald  E.  McClure 
Division  of  Applied  Mathematics 
Brown  University,  Providence,  Rhode  Island, .U.S.A. 


1.  Introduction 

Interest  in  statistical  approaches  to  reconstruction  problems  in  emission  computed  tomogra¬ 
phy  was  greatly  enhanced  by  the  work  of  Shepp  and  Vardi  (1982)  on  the  use  of  maximum  likelihood 
(ML)  methods.  There  are  earlier  instances  of  suggestions  to  regard  the  reconstruction  problem  as 
a  statistical  estimation  problem;  however,  the  demonstration  of  the  versatility  of  the  approach  as 
well  as  the  specification  of  algorithms  that  work  were  advanced  substantially  by  Shepp  and  Vardi’s 
work. 

The  image  reconstruction  problem,  viewed  as  an  estimation  problem,  is  inherently  nonpara- 
metric:  one  seeks  an  estimate  of  a  function  of  general  form  on  a  continuous  domain.  As  such,  it 
is  widely  recognized  that  the  estimates  need  to  be  regularized  or  smoothed,  especially  in  “small 
sample”  implementations.  Various  approaches  to  regularization  have  been  suggested,  including 
penalized  ML,  the  method  of  sieves,  and  Bayesian  methods.  In  Geman  and  McClure  (1985),  we 
proposed  that  a  priori  spatial  information  be  built  into  a  statistical  reconstruction  algorithm,  in 
a  Bayesian  approach,  by  quantifying  spatial  constraints  in  the  form  of  a  Gibbs  prior  distribution. 
In  this  paper  we  will  expand  on  our  earlier  description  and  present  recent  work  on  parameter 
estimation  for  the  Gibbs  priors,  which  leads  to  completely  data-driven  algorithms. 

This  application  to  single  photon  emission  computed  tomography  (SPECT)  follows  a  general 
Bayesian  paradigm  for  problems  in  image  processing  and  vision  laid  out  in  Geman  and  Geman 
(1984)  and  Grenander  (1984). 

1.  Following  the  general  procedure,  we  shall  describe  in  §2  and  §3  the  deformations  that  trans¬ 
form  the  object  X  that  we  wish  to  reconstruct  into  the  data  Y  that  we  can  observe.  The 
deformation  is  embodied  in  a  probability  distribution  n(F|A')  reflecting  the  physics  of  the 
observed  phenomenon,  the  characteristics  of  the  sensor  used,  etc.  Alone,  II(y|Ar)  is  the  basis 
for  ML  reconstructions. 

2.  The  prior  information  about  the  unknown  object  X  is  then  prescribed  in  the  form  of  a  Gibbs 
prior  distribution  TI(X)  (§4).  In  this  particular  application,  the  prior  is  designed  to  express 
spatial  constraints ,  such  as  “isotope  concentrations  within  subregions  of  common  tissue  type 
and  common  metabolic  activity  are  fairly  homogeneous.” 

3.  The  prior  distribution  and  the  deformation  mechanism  let  us  solve,  by  Bayes  formula,  for  the 
posterior  distribution  II(X|y‘)  (§5). 

4.  With  the  posterior  distribution  in  hand,  we  can  base  reconstruction  algorithms  on  the  sta¬ 
tistical  principle  of  minimum  risk.  In  §5  we  define  procedures  for  the  MAP  and  MMSE 
reconstructions. 

5.  The  special  association  of  the  Gibbs  prior  with  a  statistical  mechanical  system  translates 
into  Monte  Carlo  computational  methods,  which  mimic  the  dynamics  of  the  physical  system. 
Stochastic  relaxation  (§5)  is  a  technique  for  sampling  from  the  posterior  distribution  II(AT|y). 


In  §6  we  describe  two  methods  for  parameter  estimation  for  a  natural  parameter  of  the  family 
of  Gibbs  priors.  Finally,  we  give  examples  of  the  reconstruction  and  parameter  estimation  methods. 

This  paper  is  intended  as  an  introduction,  with  emphasis  on  the  statistical  perspective.  A 
more  complete  discussion  of  physical,  computational,  and  mathematical  issues  will  be  provided  in 
a  following  paper. 

2.  Single  Photon  Emission  Tomography 

Emission  tomography  is  used  to  determine  the  distribution  of  a  pharmaceutical  in  a  part 
of  the  body  such  as  the  brain,  liver,  or  heart.  Depending  upon  the  pharmaceutical  used,  this 
concentration  can  be  taken  as  a  measure  of  local  blood  flow  (perfusion)  and/or  local  metabolic 
activity.  Glucose,  for  example,  is  taken  up  by  neuronal  cells  in  proportion  to  metabolic  activity, 
and  the  latter  generally  mirrors  recent  electrical  activity.  Thus,  areas  of  the  brain  most  used 
in  performing  a  cognitive  or  motor  task  will  demonstrate  a  relatively  increased  uptake  of  glucose 
immediately  following  the  task.  For  the  heart,  pharmaceuticals  can  be  chosen  whose  uptake  reflects 
local  perfusion.  The  concentration  of  these  pharmaceuticals  can  thereby  be  used  to  assess  the 
adequacy  of  blood  flow  to  the  different  parts  of  the  heart. 

In  SPECT,  pharmaceutical  concentration  is  estimated  by  detecting  photon  emissions  from  an 
injected  or  inhaled  dose  of  the  pharmaceutical  that  has  been  chemically  combined  with  a  radioactive 
isotope.  This  combined  agent  is  called  a  radiopharmaceutical.  The  goal  of  SPECT  is  to  determine 
radiopharmaceutical  concentration  (equivalently,  isotope  concentration  or  density)  as  a  function  of 
position  in  a  region  of  the  body.  Detectors  with  collimators  are  strategically  placed  around  the 
region  of  interest,  and  these  are  able  to  count  photons  emitted  by  radioactive  decay  of  the  isotope. 
A  detector  will  capture  those  photons  which  escape  attenuation  and  whose  trajectories  carry  them 
down  the  bore  of  the  collimator. 

The  determination  from  photon  counts  of  isotope  concentration  as  a  function  of  position  is 
referred  to  as  reconstruction. 

Let  X(s)  denote  the  concentration  of  the  radiopharmaceutical  at  the  point  s  =  (x,y)  in  the 
domain  f l  of  interest.  We  shall  take  fl  to  be  a  bounded  two-dimensional  region,  though  for  the 
models  and  methods  we  will  describe  there  are  no  essential  changes  when  D  is  three-dimensional. 

We  assume  that  the  detectors  are  arranged  in  a  linear  array,  at  equally  spaced  lateral  sampling 
intervals,  and  that  the  detector  array  can  be  positioned  at  any  orientation  8  relative  to  the  x-axis. 
(See  Figure  1.)  We  assume  the  detectors  are  of  so-called  parallel  bore  type,  meaning  that  they 
detect  only  those  photons  in  a  small  interval  [8  -  A8/2,6  +  A8/2]  when  the  array  has  orientation 
8.  Let  L  denote  the  total  number  of  detectors  in  the  array  and  let  A o  denote  the  spacing  between 
detectors. 

The  physical  effects  incorporated  in  the  model  are  the  spatial  Poisson  process  that  describes 
the  sites  of  the  radioactive  decays  from  which  photons  emanate  and  the  process  of  photon  atten¬ 
uation  by  which  photons  are  annihilated  and  their  energy  is  absorbed  by  matter  through  which 
their  trajectories  pass.  Attenuation  is  accurately  described  by  a  linear  attenuation  function  p(s) 
on  n.  The  function  p  is  assumed  to  be  known;  values  of  p  for  bone,  muscle,  etc.  and  for  various 
photon  energies  are  known  a  priori  or  could  be  measured  by  transmission  tomographic  methods. 
Attenuation  is  a  memoryless  process  and  we  can  thus  deduce  the  functional  form  of  the  probability 
that  a  photon  survives  to  reach  the  detector  array.  When  a  photon  trajectory  has  direction  8  and 
it  emanates  from  site  s  =  (x,y)  in  fi,  then 


/’(photon  survival)  =  exp{-  /  p(Z,rj)dl}, 


r-vvjy 


mmmmm 


t=(o.,ek) 


s= (x  ,y) 


Figure  1. 

where  the  line  integral  is  taken  over  the  segment  C(x,y)  from  (x,y)  to  the  detector  and  dl  is 
differential  arc  length. 

For  our  sampling  design,  we  shall  position  the  detector  array  at  n  equally  spaced  angles  9k 
for  duration  T  time  units  at  each  angle.  Then  at  each  angle,  we  observe  the  random  variables 
y(t),  forte  Dk  =  {{<*j,9k),j  =  !*-•-*  X}  that  give  the  numbers  of  photons  reaching  the  respective 
detectors  during  the  sampling  interval.  Assuming  that  (i)  photons  are  generated  by  a  spatially 
nonhomogeneous  Poisson  process  with  intensity  X(s)  per  time  unit,  and  (ii)  the  orientations  0  of 
photon  trajectories  are  uniformly  distributed  on  [0,  2tt),  we  can  show  that  Y(t),  fort  €  D  =  UJLX.D*, 
is  itself  a  Poisson  process  with  a  nonhomogeneous  intensity  function  described  in  terms  of  the 
attenuated  Radon  transform  (ART)  of  X.  The  ART  of  X  is  defined  as 

(R^rXX^d)  =  /  r*(*,y)exp(-  [  ^,v)dl')dl 

JC  JC(z,y) 

where  C  is  the  line  with  orientation  6,  through  point  a  of  the  detector  array,  C(x,y )  is  the  segment 
of  £  starting  at  point  ( x,y )  in  Q,  and  dl  and  dl'  are  differential  arc  length  in  the  two  line  integrals. 
The  intensity  function  of  Y  is  then  given  by 

EY(t)  =  /  /  (R„,TX){<T,9)dtrd8, 

Joi-bo/l 

where  t  =  (aj,0k).  The  important  feature  of  this  representation  is  that  the  intensity  function  of  Y 
is  the  result  of  applying  a  positive  linear  integral  operator  At  to  X: 

EY  =  AtX.  (2.1) 
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The  model  includes  the  predominant  physical  effects.  Other  potentially  significant  effects, 
such  as  photon  scattering  and  background  radiation,  are  assumed  for  now  to  be  negligible.  Fur¬ 
ther,  we  have  not  included  effects  from  the  sensor,  such  as  imperfect  collimation,  blurring,  and 
noise.  We  note,  however,  that  the  reconstruction  methods  described  below,  since  they  are  based 
on  the  generally  applicable  principles  of  maximum  likelihood  and  Bayes  optimality,  are  adaptable 
to  models  incorporating  additional  physical  and  sensor  effects.  Mertus  (1987)  has  made  extensions 
for  scattering  and  collimation  errors. 

3.  Maximum  Likelihood  and  EM 

A  variety  of  reconstruction  algorithms  for  emission  tomography  are  described  by  Budinger 
et.  al.  (1979).  The  algorithms  that  are  traditionally  used  are  based  on  ideas  of  extracting  a  signal 
in  the  presence  of  noise  and  related  methods  of  linear  filtering. 

More  recently,  interest  has  been  heightened  in  the  use  of  algorithms  that  use  fuller  information 
of  the  mathematical  model  sketched  above,  along  with  the  ML  principle.  Shepp  and  Vardi  (1982) 
laid  the  mathematical  foundations  and  developed  effective  algorithms  based  on  EM  (Dempster, 
Laird  and  Rubin  (1977))  for  implementing  ML  reconstructions  in  positron  emission  tomography 
(PET).  A  penetrating  description,  written  from  a  statistician’s  perspective,  is  given  in  Vardi,  Shepp 
and  Kaufman  (1985).  (In  PET,  photon  attenuation  does  not  enter  the  model  relating  isotope 
concentration  to  the  observables.)  McClure  and  Accomando  (1984)  have  developed  the  foundations 
for  applying  ML  to  SPECT  reconstructions  and  have  implemented  EM  algorithms  on  a  variety  of 
computer  systems.  Independently,  Miller,  Snyder  and  Miller  (1985)  have  made  similar  extensions 
of  ML  and  EM  for  SPECT. 

By  exploiting  properties  of  the  Poisson  process,  it  can  be  shown  that  the  observables  Y{t) 
are  mutually  independent  and  Poisson  distributed;  the  likelihood  function  is  then  easily  obtained 
from  (2.1).  To  carry  out  a  ML  reconstruction,  we  first  discretize  the  domain  D  into  pixels  parame¬ 
terized  by  discrete  points  s  in  a  square  lattice  S.  Now  {A(s)}j6S  represents  a  piecewise  constant 
approximation  of  the  isotope  concentration  on  the  continuous  domain.  When  ft  is  discretized,  then 
equation  (2.1)  takes  the  form 

EY  =  AtX, 

where  At  is  a  matrix.  At  =  {i4(t,s)}t€£>,4gs;  commonly,  the  order  of  At  is  extremely  large  and  it 
may  not  have  full  column  rank.  Now  for  a  given  X,  the  Poisson  probability  function  of  Y  is 

U(Y\X)  =  n  ~Ynv]Ut)  «P{-(Ar*)(<)}  (3.1) 

tgz?  1 11!- 

where  our  notation  is  making  convenient  abuse  of  the  distinction  between  a  random  variable  and 
its  value. 

The  log- likelihood  function  is 

In L(x)  =  £{-ln(y(t)!)  +  y(t)ln[(ATX)(0]  -  (ATX)(t)}.  (3.2) 

ten 

The  necessary  conditions  for  maximizing  In  L(X)  obtained  by  setting  derivatives  to  zero  do  not 
yield  explicit  solutions  for  a  maximizing  X.  Nonetheless,  -  In  L(X)  is  globally  convex,  and  the 
ML  optimization  problem  conveniently  adapts  to  the  EM  method.  In  general,  —  lnL(A)  is  not 
strictly  convex;  this  is  an  identifiability  issue  related  to  the  column  rank  of  At-  Conditions  for 
strict  convexity  are  discussed  by  Accomando  (1984). 
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The  EM  algorithm  becomes  an  explicit  iterative  reconstruction  procedure.  We  initialize  the 
iteration  with  {X(°)(s)}j6s  and  update  X ^  by  the  formula 

X(t+1)  =  {[A't(Y  0  Ar*<T>)]  0  A^l}  0  X(t),  (3.3) 

where  1  is  the  vector  whose  components  are  identically  one,  0  denotes  component-by-component  di¬ 
vision,  and  0  denotes  component-by-component  multiplication.  At  each  step,  the  iteration  requires 
two  (large)  matrix  multiplications.  The  sequence  of  iterates  converges  to  an  Xm  that  maximizes 
In  L(X).  Consistency  results  that  depend  on  the  sampling  design  and  on  the  discretization  of  ft 
can  be  proved. 

Figure  2C  in  §5  shows  an  example  of  a  ML  reconstruction  for  a  simulation  experiment.  The 
true  isotope  density  used  for  the  simulated  data  is  depicted  in  Panel  A  of  Figure  2.  The  noisy 
appearance  of  the  ML  reconstruction  is  not  atypical,  even  though  the  sample  size  is  rather  large 
in  this  experiment  for  estimating  the  32  x  32  discrete  image.  The  high  degree  of  local  irregularity 
occurs  because  ML  builds  in  no  spatial  information,  e.g.  about  relative  locations  of  pixels-  in  the 
grid.  Snyder  and  Miller  (1985),  recognizing  the  inherent  nonparametric  nature  of  the  reconstruction 
problem,  have  suggested  using  Grenander’s  method  of  sieves  (Grenander  (1981))  to  regularize  the 
ML  estimates.  Accomando  (1984)  also  uses  sieves  to  study  consistency  questions. 

4.  Gibbs  Prior  Distribution 

We  suggest  a  Bayesian  formulation  for  incorporating  prior  spatial  constraints  into  the  recon¬ 
structions.  We  shall  construct  a  prior  distribution  on  X  that  captures  simple  prior  expectations 
about  the  qualitative  nature  of  the  isotope  density.  Mainly,  we  wish  to  exploit  the  anticipated 
smoothness  of  X.  Neighboring  locations  will  typically  have  similar  intensity  levels.  But  we  must 
also  accommodate  sharp  changes  in  concentration,  which  might  occur  across  an  arterial  wall  or 
across  a  boundary  between  two  tissue  types. 

In  the  spirit  of  nonparametric  estimation,  we  might  construct  the  prior  on  a  suitable  space  of 
functions  X  :  ft  — ►  R.  It  is  more  convenient,  however,  to  do  the  construction  on  the  discrete  domain 
S  introduced  in  §3.  The  prior,  therefore,  is  on  the  array  X  =  (X(s)},es-  The  range  of  values 
of  X(s)  will  be  confined  to  a  compact  interval,  usually  [0,255],  and  might  be  further  restricted  to 
only  the  integer  values  in  the  interval.  As  a  further  convenience,  we  will  restrict  ourselves  to  priors 
with  Gibbs  representation 

H(X)  =  ■—  exp  {—U(X)}  (4.1) 

where  Z  is  the  normalizing  constant,  Z  =  /  exp{-U(X)}dX,  and  U  :  Rs  -*•  R  is  known  as  the 


“energy”.  As  it  stands,  the  Gibbs  representation  is  only  mildly  restrictive  since  U  is  arbitrary. 
However,  we  shall  restrict  U  to  involve  only  “nearest  neighbor”  interactions  among  the  components 
of  X. 

We  employ  the  Gibbs  representation  because  it  is  easier  to  design  an  energy  function  with 
desired  properties  (such  as  localization  of  interactions,  Markovian  restrictions  on  conditional  distri¬ 
butions,  . . . )  than  it  is  to  construct  a  distribution  II  directly.  We  will  design  U  so  that  the  expected 
configurations  have  low  energy  as  they  do  in  a  real  physical!  system.  The  expected  configurations 
are  those  for  which  typical  neighboring  sites  s,t  €  S  have  similar  intens.ties  X(s),X(t).  This  is  a 
local  constraint  and  it  is  conveniently  captured  by  a  locally  composed  energy  function  U , 

U(X)  =  £>*(X(s)  -  X(t))  +  J2  -4  W*)  -  *(<))•  (4.2) 

(»,<!  o,t> 
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Here  we  use  [s,  t]  to  indicate  that  a  and  t  are  nearest  horizontal  or  vertical  neighbors  in  the  lattice  SA 
and  <  s,t  >  to  denote  diagonal  neighbors.  The  constant  /?  is  positive  and  the  function  4>(0  is  even* 
and  minimized  at  £  =  0.  Thus  U  is  minimized  by  configurations  of  constant  intensity.  Under  the 
Gibbs  distribution  (4.1)  the  more  likely  isotope  densities  are  those  with  small  site-to-site  variation 
in  intensity. 

This  definition  of  <f>  and  U  induces  a  graph  on  S  in  which  each  pixel  site  s  is  linked  to  its  eight 
nearest  neighbors  in  the  square  lattice.  The  distribution  II  then  determines  a  Markov  random  field 
with  this  neighborhood  structure. 

To  achieve  the  desired  properties  for  the  more  likely  isotope  densities,  the  exact  form  of  0  is* 
probably  not  important,  but  its  qualitative  features  can  make  a  difference.  We  have  experimented 
with  <(>'s  that  are  increasing  in  £  for  £  >  0.  An  obvious  choice  is  <j>(()  =  £2,  but  then  under  H(X), ' 
large  intensity  gradients,  as  would  be  associated  with  certain  natural  boundaries,  are  exceedingly ' 
unlikely.  Instead,  we  use  functions  of  the  form 


where  6,  like  /?,  is  a  constant  to  be  fixed  later. 
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There  are  two  free  parameters  in  the  specification  of  U:  S  is  easily  interpreted  as  a  scale 
parameter  on  the  range  of  values  of  X(s)  and  /?  controls  the  “strength”  of  the  interactions  between 
a  pixel  and  its  neighbors.  It  is  a  natural  parameter  of  the  exponential  family  (4.1),  and  admits 
meaningful  statistical  and  physical  interpretations.. From  the  physical  viewpoint,  f3  is  the  reciprocal 
of  temperature  for  the  statistical  mechanical  system  defined  by  (4.1).  From  the  statistical  viewpoint, 
it  will  be  seen  as  a  “smoothing  parameter”  controling  the  tradeoff  for  our  reconstructions  between 
the  influence  of  the  observables  and  the  influence  of  the  prior  constraints. 

Levitan  and  Herman(1987)  have  recently  proposed  the  use  of  Gaussian  priors  in  a  Bayesian 
formulation.  Liang  and  Hart  (1987)  also  suggest  the  use  of  Gaussian  priors,  as  well  as  others, 
deduced  by  max-ent  arguments  from  prior  constraints  on  low-order  moments  of  X.  Our  earlier 
experiments  with  the  quadratic  energy  function  indicated  that  the  resulting  Bayesian  algorithms 
oversmoothed  real  boundaries  where  the  difference  (X(s)  —  X(t))  should  be  allowed  to  be  large. 
The  finite  asymptotic  behavior  of  our  ^-function  was  designed  to  mitigate  this  oversmoothing. 

5.  Posterior  Distribution  and  Bayes  Optimal  Reconstructions 
From  (3.1)  and  (4.1)  the  posterior  distribution  on  X  is 

n(*|y)  =  W)  +  E [y(01nl(ATX)(t)]  -  (ArX)(t)] }  (5.1) 

where  Z(Y)  is  a  normalizing  constant  that  depends  on  Y . 

We  have  developed  algorithms  for  two  Bayes  opptimal  reconstructions  of  X — the  minimum- 
mean-squared-error  (MMSE)  estimator 


X*  =  E(X\Y) 


(5.2) 


and  the  maximum-a-posteriori  (MAP)  estimator,  which  maximizes  the  value  of  n(X|T)  or  equiv¬ 
alently  minimizes  the  posterior  energy 


U(X)  -  £[y(01n[(ATX)(0]  -  (ArX)(0]. 


(5.3) 
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The  algorithm  for  each  of  these  reconstructions  is  built  around  a  technique  for  simulating  compu¬ 
tationally  the  dynamics  of  a  statistical  mechanical  system  with  energy  given  by  (5.3).  Details  of 
the  generic  algorithm,  a  variant  of  the  Metropolis  algorithm  (Metropolis  et.  al.  (1953))  known  as 
stochastic  relaxation  (SR),  are  given  in  Geman  and  Geman  (1984);  the  idea  is  sketched  below. 

Notice  in  (5.3)  that  we  have  the  usual  equivalence  between  Bayesian  MAP  estimation  and 
so-called  penalized  ML.  ML  maximizes 

52[y(t)ln[(ATX)(01-(AT^)(t)], 

tec 

wheras  MAP  estimation  includes  the  “penalty  term”  —U(X),  which  penalizes  lack  of  smoothness. 
One  advantage,  we  believe,  of  the  Bayesian  viewpoint  is  that  it  suggests  mechanisms  for  estimating 
the  required  degree  of  smoothness,  which  amounts  to  estimating  the  pivotal  parameter  /?  in  the 
Gibbs  prior.  We  focus  on  this  estimation  problem  in  the  next  section. 

MMSE  Algorithm.  The  computational  method  is  iterative.  We  initialize  X  =  X(0K  In  practice, 
we  choose  a  “good”  initialization  such  as  the  EM  reconstruction,  but  easy  theory  says  that  con¬ 
vergence  is  independent  of  the  initialization.  We  visit  each  site  s  in  the  pixel  array,  successively  in 
any  order,  and  replace  X(s)  by  a  value  sampled  from  the  conditional  distribution  on  X(s),  under 
(5.1)  and  conditioning  on  all  X{t),t  ^  s;  this  is  the  essence  of  stochastic  relaxation  (SR).  The 
iterates  X ^  form  a  Markov  chain  with  equilibrium  distribution  (5.1).  The  ergodicity  of  the  chain 
guarantees  that  an  ergodic  average  of  will  converge  to  A*  a.s.  In  practice,  we  compute 

N  iterates  and  average  the  final  M,  with  choices  such  as  N  =  25  and  M  =  5.  The  selection  of 
suitable  M  and  N  can  be  guided  by  monitoring  stabilization  of  statistics  of  the  successive  iterates 

x<T>. 

MAP  Algorithm.  Computing  the  minimum  of  (5.3)  is,  in  general,  a  hard  problem.  The  method 
of  simulated  annealing  can  be  implemented  to  yield  a  sequence  converging  in  distribution 

to  a  MAP  estimator  X' .  The  procedure  is  similar  to  SR.  The  fundamental  ideas  are  described 
in  Pincus  (1970),  Cerny  (1982),  and  Kirkpatrick,  Gellatt  and  Vecchi  (1983).  See  also  Geman  and 
Geman  (1984)  for  applications  to  image  processing. 

For  the  design  of  feasible  algorithms,  we  are  guided  by  pragmatism  as  well  as  by  the  theoretical 
underpinnings  of  SR  and  simulated  annealing.  First  we  compute  the  ML  reconstruction  by  EM. 
Then— in  the  language  of  simulated  annealing — we  “run”  the  physical  system  with  posterior  energy 

(5.3)  at  zero  temperature.  When  our  state-space  (the  range  of  values  for  X(s))  is  discrete,  this 
amounts  to  using  Besag’s  method  of  Iterated  Conditional  Modes  (ICM),  Besag  (1986).  When  the 
state-space  is  a.  continuous  interval  and  the  temporal  index  is  also  continuous  (r  £  [0,oo)),  we 
implement  this  step  by  performing  gradient  descent  on  (5.3)  starting  at  the  EM  reconstruction. 
The  local  minimum  of  (5.3)  obtained  by  ICM  or  by  gradient  descent  is  our  approximate  MAP 
estimate  of  X. 

Note  that  ICM  and  gradient  descent  do  not  guarantee  convergence  to  a  global  minimum  of 

(5.3) .  The  rationale  for  making  a  judicious  choice  for  the  initialization  is  to  capture  a  “good”  local 
minimum  for  the  approximate  MAP  reconstruction. 

Figure  2,  Panels  D,  E,  and  F,  shows  approximate  MAP  reconstructions  of  the  known  phantom 
depicted  in  Figure  2A.  First  the  ART  of  the  phantom  X  was  computed,  for  n  =  60  sampling  angles 
and  L  =  64  lateral  sampling  steps.  The  nonuniform  attenuation  function  p  depicted  in  2B  was  used 
to  compute  the  ART;  it  builds  a  very  substantial  attenuation  effect  into  the  model.  The  Poisson 
data  Y  was  generated  to  satisfy  (3.1).  Figure  2C  shows  the  approximate  ML  reconstruction  after  54 
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iterations  of  EM.  For  the  Bayesian  reconstructions,  we  used  the  prior  of  (4.1)-(4.3),  with  6  =  0.7, 
and  with  the  range  of  A(s)  in  [0, 15]  Each  of  the  Bayesian  reconstructions  is  computed  by  gradient 
descent  starting  from  the  EM  estimate  (EM-GrD).  The  effect  of  different  choices  for  0  on  the  degree 
of  smoothing  is  apparent.  We  shall  discuss  estimation  of  0  in  the  next  section. 

6.  Parameter  Estimation 

The  choice  of  0  is  critical.  With  0  =  0  the  estimator  is  undersmoothed,  and  in  fact  MAP 
estimation  is  just  ML,  since  the  prior  is  uniform.  If  0  is  too  large,  the  estimator  is  too  faithful 
to  the  prior  and  is  oversmoothed.  The  parameter  S  is  also  important,  though  we  have  found 
that  (i)  its  value  can  usually  be  set  based  on  information  about  the  range  of  values  {A(s)},  and 
(ii)  reconstructions  are  not  sensitive  to  moderate  changes  in  6.  The  discussion  here  will  focus  on  0. 

Because  of  the  setting  in  which  reconstruction  algorithms  are  actually  used,  it  is  desirable 
to  design  estimation  methods  that  work  with  a  sample  Y  of  size  one  from  the  observable  process 
The  isotope  density  X  is  assumed  to  be  drawn  from  a  Gibbs  prior  with  unknown  0,  but  known  6 
(4.3).  We  shall  estimate  0  from  Y  and  use  the  estimate  0  in  the  MMSE  or  MAP  reconstruction 
program.  It  is  reasonable  to  do  this  with  a  single  observation  Y,  since  Y  contains  a  large  amount 
of  data  about  X,  which,  in  turn,  contains  a  large  amount  of  data  about  the  local  energy  function 
U(X). 

To  be  more  explicit  about  the  dependency  on  0  of  the  prior  and  posterior  distributions,  we 
introduce  the  function 

V(X)  =  ]T>(a(s)  -  X(t))  +  -4  £  -  *m 

M  <s,t> 

V  is  just  U 10.  The  prior  is  now  written 


U(X)=-exp{-0V(X)} 

60 


and  the  posterior,  given  Y ,  is 


n(*l Y)  =  -77-r e*v{-0V(X)  +  £[y(t)ln[ {ATX)(t))  -  {ATX){t)}  } 
)  ten 


Now  V(X)  is  a  complete-data  sufficient  statistic  for  0.  If  we  were  able  to  observe  A'  directly, 
then  we  could,  in  principle,  solve  the  likelihood  equation 

Ep[V{X)}  =  V(X)  (6.1) 

for  the  ML  estimate  of  0.  The  left-hand  side  of  (6.1)  is  strictly  decreasing  in  0  and  thus  (6.1)  yields 
a  unique  root  0. 

Our  situation  is  more  complicated  than  this  since  we  do  not  observe  A,  but  instead  we  see 
only  the  incomplete  data  Y.  We  have  a  classic  setup  for  application  of  EM.  The  EM  algorithm, 
when  it  converges,  will  yield  a  root  of  the  incomplete-data  likelihood  equation 

Ep[V(X)]  =  E0(V(X)\Y);  (6.2) 
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see  Dempster,  Laird  and  Rubin  (1977).  We  note  that  there  is  no  proof  of  uniqueness  of  roots  of 
(6.2).  Conceptually,  (6.2)  is  solved  at  the  intersection  of  two  monotone  decreasing  functions  of  0. 
Whether  (6.2)  does  admit  multiple  solutions  is  an  open  and  elusive  theoretical  question. 

.  To  solve  (6.2),  the  EM  algorithm  consists  of  two  alternating  steps — estimation  of  the  right- 
hand  side  of  (6.2)  for  prescribed  0  (E-step)  and  computation  of  the  root  0  of  (6.2),  substituting 
the  current  estimate  of  E0(V(X)\Y)  on  the  right-hand  side.  Specifically,  we  fix  an  initial  =  /9° 
and  an  initial  X  =  X°  (and  hence  V°).  Then  solve 


E-step.  Estimate  the  complete-data  sufficient  statistic: 

V<T+1>  =  Efir,(ViX)\Y) 


(6.3a) 


M-step.  Determine  /?(T+D  as  the  solution  of 

E0[V(X)\  =  V<T+1>. 


.  (6.3 b) 


The  first  step  is  done  using  SR,  using  say  ten  steps  of  SR  and  averaging  the  last  five  values  of 
V^Y^).  The  second  step  is  a  simple  root-finding  calculation  once  the  curve  E0[V(X)\  is  known 
Conveniently,  the  SR  procedure  simultaneously  yields  updates  X ^  of  the  MMSE  reconstruction 
Thus  (6.3a)  and  (6.3b)  together  give  a  completely  data-driven  method  of  reconstruction. 

The  construction  of  E0[V(X)\  as  a  function  of  0  can  be  done  “off  line”,  once  and  for  all 
We  have  done  this  using  SR  to  simulate  230  configurations  X  from  the  prior  (4.1)  for  /3-values 
ranging  from  0  to  6.  Five  replications  were  done  at  each  of  forty-six  values  of  /3.  The  resulting 
curve,  fit  by  a  cubic-spline  regression  function,  is  depicted  in  Figure  3.  the  calculation  cf  this  curve 
required  forty-one  hours  of  CPU  time,  using  a  highly  optimized  program  on  the  100  Megaflop  Star 
Technologies  ST100  Array  Processor. 

J.  Mertus  (1987)  has  developed  an  efficient  vectorized  FORTRAN  program  for  the  EM  esti¬ 
mation/reconstruction  procedure  described  above.  Each  E-step,  with  ten  sweeps  of  SR,  takes  on 
the  order  of  seven  minutes  of  CPU  time  on  an  IBM3090  or  about  three  minutes  on  a  CYBER  205, 
working  on  a  64  x  64  pixel  lattice  S,  for  isotope  densities  X  having  their  support  on  a  disk  of 
diameter  44  pixels  (about  22cm)  and  with  a  range  of  64  grey  levels.  (These  values  correspond  to 
our  real  data  sets.)  The  computational  requirements  are  enormous,  but  not  prohibitive. 

To  circumvent  the  computational  demands  of  EM,  we  have  devised  and  experimented  with  a 
moment  method  for  estimating  0.  The  goal  is  to  have  a  direct  estimation  method  for  0  that  can  be 
applied  to  the  observable  Y  without  requiring  intermediate  reconstruction  of  X.  We  construct  a 
statistic  M(Y)  based  on  the  notion  that  the  smoothness  of  Y  will  reflect  the  magnitude  of  0  in  the 
same  way  that  the  smoothness  of  X  does.  The  exact  form  of  M(Y)  is  also  guided  by  our  knowledge 
of  the  Poisson  distribution  of  Y  and  ability  to  compute  theoretical  moments  of  the  Poisson  random 
variables. 

For  the  detector  bin  at  angle  Ok  and  at  sampling  step  <r;-,  denote  t  =  ( crj,0k )  and  t+  = 
(<7j+l,0k).  Also,  introduce  the  notation  a(t)  =  (Ayl)(/),  where  1  is  the  vector  with  components 
identically  equal  to  one;  a(t)  is  simply  the  row-sum  of  At  associated  with  the  detector  at  location 
t.  Then  define  the  moment  statistic 
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Figure  3. 


The  inner  sum  restricts  the  moment  to  the  centraJ  part  of  the  support  of  the  isotope  density  to 
avoid  edge  effects.  The  expectation  of  M(Y),  for  given  X,  is  a  measure  of  roughness  of  normalized 
ART  projections  of  X : 


E{M(Y)\X)=Y,Y,T2 


ATX(t) 

«(0 


ArX(t+) 
a  (<+) 


(6.5) 


We  anticipate  that  the  expectation  Ep[M(Y )]  with  respect  to  the  prior  will  have  the  same  general 
behavior  as  Ep[V(X)\  in  (6.1).  Accordingly,  we  define  the  moment  estimate  0‘  of  0  as  the  root  of 
the  equation 

Ep[M{Y)\  =  M[Y).  (6.6) 

The  effort  to  compute  0m  is  trivial,  once  the  left-hand  side  of  (6.6)  is  known  as  a  function  of  0. 

We  have  constructed  the  curve  describing  Ep[M(Y)]  using  the  same  simulated  X-data  that 
generated  Ep[V(X )]  in  Figure  3.  Figure  4  shows  the  resulting  curve;  it  does,  indeed,  exhibit  the 
same  qualitative  behavior  as  the  curve  in  Figure  3. 

A  variety  of  experiments  have  been  done  with  both  the  EM  and  moment  method  of  estimating 
0.  The  most  ideal  circumstance,  of  course,  is  when  the  model  truly  fits  the  data. 

In  one  such  experiment,  an  X-array  was  generated  from  the  prior  (4.1)  with  0  =  1.  (As 
above,  we  used  a  64  X  64  pixel  lattice,  64  grey  levels,  a  disk  of  diameter  44  pixels  for  the  support  of 
X,  and  a  uniform  attenuation  function  for  the  construction  of  At-)  In  implementing  the  E-step, 
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Figure  4. 


ten  passes  of  SR  were  performed  and  the  last  five  values  of  V(X<-T^)  were  averaged  to  estimate  the 
right-hand  side  of  (6.3a).  When  0°  =  0.0,  the  successive  iterates  of  0^  from  the  M-step  were  0.63, 
0.85,  0.95,  1.01,  and  1.03.  When  0°  —  6.9,  the  successive  were  2.26,  1.15,  1-08,  1.05,  1.05,  and  1.04. 
For  the  same  X-array,  five  independent  replications  of  the  observable  Y  process  were  generated 
and  the  moment  method  yielded  estimates  0 *  of  0.97,  1.00,  1.02,  1.06,  and  0.9S;  the  five  estimates 
have  mean  1.005  and  standard  deviation  0.034. 


For  more  thorough  testing  of  the  moment  method,  a  test  set  of  X-arrays — independent  of 
the  set  used  to  construct  the  curves  in  Figures  3  and  4 — was  generated  with  /0-values  ranging 
from  0.5  to  2.5.  For  each  0,  five  ^-arrays  were  generated,  and  for  each  X-array,  five  independent 
replications  of  the  Y  process  were  simulated.  Figure  5  depicts  the  estimate  errors  0"  ~  0  for  each 
of  the  twenty-five  experiments  at  each  0-value.  The  dispersion  of  the  errors  as  a  function  of  0  is 
what  one  would  anticipate  from  the  slope  of  Ep[M{Y)}. 


7.  Reconstruction  Experiments 

We  report  on  two  experiments  which  have  been  run  on  real  and  simulated  data  to  learn 
about  the  performance  of  the  Bayesian  reconstruction  methods  in  cases  for  which  the  underlying 
model  does  not  fit  exactly.  One  simulation  experiment  was  designed  to  test  the  versatility  and 
robustness  of  the  methods  to  known  departures  from  the  model.  The  other  experiment  illustrates 
the  performance  of  the  algorithms  on  real  data  from  a  lung  section. 

The  pseudo-grey-level  images  in  Figures  6  and  7  associate  high  values  in  [0,63]  with  black  and 
low  values  with  white.  Our  ability  to  present  pictorial  examples  is  limited  by  the  printing  process 
for  this  volume.  Interested  readers  can  obtain  higher  resolution  copies  of  photographs  on  request  to 
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Figure  5. 

D.E.  McClure. 

Experiment  1-  A  phantom  isotope  density  (Figure  6 A)  was  designed  to  have  a  combination  of 

(i)  large-scale  structure,  including  subregions  of  ft  with  considerable  differences  in  intensity,  and 

(ii)  local  irregularity  of  the  same  qualitative  nature  as  that  of  sample  functions  from  the  Gibbs 
model  (4.1)-(4.3),  yet  not  precisely  fitting  the  Gibbs  model.  Two  functions  were  averaged  to  form 
the  phantom.  First,  an  array  with  a  sharp  spike  in  intensity  (near  the  center,  below  the  middle) 
was  constructed.  Second,  an  array  was  sampled  from  (4.1)-(4.3)  with  parameter  values  0=1, 
and  8  =  12.  Intuitively,  the  local  structure  of  the  average  will  be  governed  by  the  array  sampled 
from  the  Gibbs  model.  But  observe  that  the  rescaling  of  this  array  due  to  the  arithmetic  averaging 
means  that  it  will  not  exactly  fit  a  model  from  the  same  family.  Roughly  speaking,  the  averaging 
has  the  effect  of  smoothing  the  array  so  that  it  will  be  better  described  by  a  Gibbs  model  with 
larger  /3-value,  assuming  8  is  fixed  for  now.  We  thus  anticipate  estimated  values  of  0  larger  than 
the  value  0  =  1  used  to  generate  the  Gibbsian  part  of  the  averaged  phantom. 

To  simulate  the  emitted  photons,  the  constant  linear  attenuation  function  ft  =  0.2  was  chosen, 
corresponding  to  approximately  ten  percent  attenuation  per  centimeter  for  our  scaling  of  the  real 
system.  A  total  of  663,144  photons  were  counted  at  64  angles  9 ,  with  L  =  64  bins  on  the  linear 
detector  array;  in  actuality,  only  44  of  the  bins  collect  positive  counts  because  the  support  of  the 
phantom  is  contained  in  a  smaller  disk  of  diameter  44  pixels. 

Reconstructions  are  depicted  in  Panels  B-F  of  Figure  6.  All  were  constructed  on  the  range 
(0,63]  with  parameter  8  =  12.  The  MMSE  reconstruction,  with  0  estimated  by  EM  (6.3)  is  shown 
in  Panel  6B.  When  0  was  initialized  at  0°  =  0.0,  the  successive  EM  iterates  from  (6.3b)  were  0.52, 
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Figure  6.  Simulated  Data,  603,144  Total  Counts 
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0.72,  0.85, 1.01,  1.18, 1.29, 1.34, 1.38,  1.40, 1.41, ....  1.47  after  thirteen  steps  of  (6.3b).  The  MMSE 
in  Panel  6B  was  run  at  4  =  1-47.  (The  moment  estimate  of  0  was  0m  =  1.38.)  Panel  6C  depicts 
an  approximate  MAP  reconstruction  obtained  by  ICM,  with  4  =  1.47  and  using  the  MMSE  in 
Panel  6B  to  initialize  the  local  minimization  of  the  posterior  energy.  Characteristically,  the  MAP 
is  slightly  smoother  than  the  MMSE;  on  a  video  monitor  the  difference  is  perceptible  and  manifests 
itself  in  apparent  coarser  transitions  between  grey  levels  in  the  MAP  image. 

The  EM  reconstruction  after  5000  (!)  steps  of  (3.3)  is  shown  in  Panel  6D.  When  (3.3)  is 
run  with  double  precision,  the  successive  iterates  still  continue  to  increase  the  log-likelihood  (3.2) 
after  5000  iterations.  Panel  6E  shows  an  MMSE  run  with  a  value  of  0  =  0.52,  which  is  too  small 
(undersmoothing).  Panel  6F  shows  an  MMSE  run  with  a  value  of  /3  =  4.40,  which  is  too  large 
(oversmoothing). 

Experiment  2.  A  total  of  124,136  photons  were  counted  from  a  cross-section  of  a  patient’s  torso, 
including  the  lungs.  The  observed  data  are  depicted  in  the  so-called  sinogram  in  Figure  7A.  The  ' 
darkness  in  the  figure  is  proportional  to  the  number  of  detected  photons.  The  first  column  of 
Panel  7A  corresponds  to  the  linear  detector  being  positioned  to  the  right  of  the  lung  section;  the 
subinterval  of  high  counts  in  this  column  is  the  “shadow”  of  the  region  of  high  isotope  concentration 
in  the  lung.  The  successive  columns  in  Panel  7 A  correspond,  in  turn,  to  the  data  from  the  successive 
sampling  angles.  We  are  using  the  same  sampling  design  as  in  Experiment  1,  with  64  equally  spaced 
angles  9  and  L  =  64  lateral  sampling  steps  on  the  linear  detector  array. 

For  the  reconstructions,,  we  set  the  linear  attenuation  function  again  at  n  =  0.2.  The 
reconstructions  were  done  on  the  range  [0,63]  with  fixed  6  =  12. 

Panel  7B  shows  the  EM  reconstruction  after  5000  steps  of  (3.3).  The  “hot  spot”  in  the  lung  is 
apparent,  but  local  structure  is  difficult  to  distinguish.  Panel  7C  shows  the  MMSE  reconstruction 
with  (3  estimated  at  4  =  4.56  after  four  steps  of  the  EM  estimation  procedure  (6.3);  here  we 
initialized  0°  =  6.0.  Panel  7D  shows  an  approximate  MAP  reconstruction  formed  by  applying  ICM, 
setting  4  =  4.56,  and  using  the  EM  reconstruction  in  Panel  7B  to  initialize  the  local  minimization 
of  the  posterior  energy.  Again  in  this  experiment,  the  MAP  reconstruction  is  somewhat  smoother 
than  the  MMSE. 

The  moment  estimate  for  (3  in  this  example  is  0’  =  2.71.  The  moment  estimate  is  sensitive  to 
sharp  singularities  in  the  isotope  concentration,  such  as  the  hot  spot  in  the  lung  data.  We  feel  that 
the  moment  method  can  be  made  more  robust  by  using  terms  other  than  the  quadratic  variation 
used  in  (6.4)  for  the  summands  that  define  the  moment  statistic.  There  are  analytical  obstacles, 
however,  to  calculating  a  bias  correction  for  alternative  summands,  so  that  the  expectation  of  the 
moment  statistic,  given  X,  is  a  function  of  differences  alone,  as  (6.5)  is. 
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SUMMARY 

The  reconstruction  problem  for  SPECT  (single  photon  emission  computed  tomography)  is  formu¬ 
lated  as  a  statistical  estimation  problem:  estimate  the  nonhomogeneous  intensity  function  of  a 
two-  (or  three-)  dimensional  Poisson  process  from  indirect  observations.  Previously,  this  has  been 
addressed  using  the  principle  of  maximum  likelihood,  but  the  likelihood  method  does  not  incorpo¬ 
rate  spatial  constraints.  Alternatively,  spatial  information  about  the  unknown  intensity  function 
can  be  described  by  a  Gibbs  prior  distribution  and  this  then  leads  to  Bayesian  methods  for  the 
reconstruction  (estimation)  problem.  Bayesian  reconstructions  are  described  and  illustrated  by 
examples  using  both  real  and  simulated  data.  A  parameter  estimation  problem  for  the  Gibbs  prior 
distributions  is  posed.  Two  methods  are  suggested  and  illustrated  for  the  subsidiary  parameter 
estimation  problem.  Computational  algorithms  are  given. 

RESUME 

Nous  considerons  le  probleme  de  reconstruction  de  SPECT  (single  photon  emission  computed  to¬ 
mography)  comme  etant  un  probleme  d’estiraation;  c’est  a  dire  que  nous  estimons  la  fonction 
d’intensite  (nonhomogene)  d’un  processus  Poissonien  a  2  (ou  3)  dimensions.  Jusqu’a  maintenant, 
ce  probleme  a  ete  traite  en  utilisant  le  principe  du  maximum  de  vraisemblance;  mais  cette  methode 
ne  tient  pas  compte  des  contraintes  spatiales.  D’autre  part,  1’information  spatiale  sur  la  fonction 
d’intensitd  inconnue  peut  etre  traduite  par  1’emploi  d’une  distribution  de  Gibbs  a  priori,  et  nous 
sommes  conduit  a  une  methode  Bayesienne  pour  le  probleme  de  reconstruction.  Nous  decrivons 
des  reconstructions  Bayesiennes  et  donnons  des  exemples  utilisant  a  la  fois  des  donnees  reelles  et 
simulees.  Nous  posons  des  questions  sur  1’estimation  des  parametres  de  la  distribution  a  priori 
de  Gibbs,  et  nous  suggerons  et  donnons  des  exemples  d’application  de  deux  methodes  pour  ce 
probleme  subsidiaire  de  1’estimation  de  parametres.  Nous  donnons  aussi  les  algorithmes  utilises. 
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ABSTRACT 


We  exploit  a  Bayesian  framework,  using  Gibbs  priors,  for  finding  boundaries 
and  for  partitioning  scenes  into  homogeneous  regions.  In  both  applications,  the 
prior  model  is  a  joint  probability  distribution  for  the  array  of  pixel  grey  levels  and 
an  array  of  “labels.”  In  boundary  finding,  the  labels  are  binary,  zero  or  one,  rep¬ 
resenting  the  absence  or  presence  of  boundary  elements.  In  partitioning,  the  label 
values  are  generic:  two  labels  are  the  same  when  the  corresponding  scene  locations 
are  considered  to  belong  to  the  same  region.  The  prior  incorporates  a  measure 
of  disparity  between  certain  spatial  features  of  pairs  of  blocks  of  pixel  grey  levels, 
using  the  Kolmogorov- Smirnov  nonparametric  measure  of  difference  between  the 
distributions  of  these  features.  Large  disparities  encourage  intervening  boundaries 
and  distinct  partition  labels.  The  number  of  model  parameters  is  minimized  by 
forbidding  label  configurations  that  are  inconsistent  with  prior  beliefs,  such  as  those 
defining  very  small  regions,  or  redundant  or  blindly  ending  boundary  placements. 
Forbidden  configurations  are  assigned  prior  probability  zero.  We  examine  the  MAP 
{maximum  a  posteriori)  estimator  of  boundary  placements  and  partitionings.  The 
forbidden  states  introduce  constraints  into  the  calculation  of  MAP  configurations. 
Stochastic  relaxation  methods  are  extended  to  accommodate  constrained  optimiza¬ 
tion,  and  experiments  are  performed  on  some  texture  collages  and  some  natural 
scenes. 
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Boundary  finding,  segmentation,  texture  discrimination,  Bayesian  inference, 
Gibbs  distribution,  Markov  random  field,  MAP  estimate,  stochastic  relaxation,  an¬ 
nealing,  constrained  optimization. 


1.  INTRODUCTION 


Most  problems  in  image  analysis,  from  signal  restoration  to  object  recogni¬ 
tion,  involve  inference  about  physical  entities,  usually  those  in  a  three-dimensional 
scene.  These  inferences  (or  estimates)  are  based  on  both  the  observed  data,  usually 
radiant  energy  (or  range)  measurements,  and  general  information,  and  imply  rep¬ 
resentations  of  the  image  in  terms  of  unobserved,  attributes  or  label  variables.  The 
labels  may  be  abstract  (“edge”,  “class  k”)  or  concrete  (“occluding  edge”,  “grass”), 
measurements  (depth,  surface  normal)  or  semantical  (“two  bolts”).  They  may  be 
conclusive,  or  serve  as  intermediate  data  structures  for  further  analysis,  perhaps 
involving  additional  data,  assorted  “sketches”,  or  stored  models. 

This  work  is  about  two  such  representations:  partitions  and  boundaries.  The 
partition  labels  do  not  classify.  Instead  they  are  generic  and  are  assigned  to  blocks 
of  pixels;  the  size  of  the  blocks  (or  label  resolution)  depends  on  the  resolution  of 
the  data  and  intended  interpretations.  The  boundary  labels  are  just  “on”  or  ‘  ort '. 
and  associated  with  an  inter-pixel  sub-lattice.  Two  specific  models  are  constructed 
in  Sections  2  and  3;  these  are  instances  of  a  “label  model”,  which  will  be  outlined 
presently,  and  which  in  turn  is  an  application  of  the  Bayesian  paradigm  in  previous 
work  ([24], [25], [27], [33]). 

Both  models  are  applied  to  the  problem  of  texture  discrimination.  The  data 
is  a  grey-level  image  consisting  of  textured  regions,  such  as  a  mosaic  of  inicro- 
textures  from  the  Brodatz  album,  a  patch  of  rug  inside  plastic,  or  radar- imaged 
ice  floes  in  water.  The  goal  is  to  find  the  regions,  either  by  assigning  generic 
labels  to  the  pixels,  or  by  constructing  a  boundary  map,  which  of  course  avoids 
the  microedges  within  the  textures.  The  problem  is  more  difficult  than  texture 
identification  or  classification ,  in  which  we  are  presented  with  only  one  texture  from 
a  given  list.  Discrimination  can  be  complicated  by  an  absence  of  information  about 
the  number  of  textures,  or  about  the  size,  shape,  or  number  of  regions.  In  addition, 
the  microedges  within  the  textures  may  represent  sharper  intensity  changes  than 
those  associated  with  the  texture  boundaries. 

There  is  no  effort  to  “model”  the  textures  (and  hence  no  capacity  for  texture 
synthesis).  Partitioning  and  boundary  placements  are  driven  by  the  observed  spatial 
statistics  as  summarized  by  selected  features.  Still,  the  labeling  is  not  unsupervised 
because  in  some  cases  we  use  “training  samples”  to  select  feature  thresholds;  see 
Sections  2  and  3.  We  experimented  with  several  classes  of  features:  the  well-known 
ones  based  on  co-occurrence  matrices  ([37])  and  new  ones  based  on  “directional 
residuals”.  The  latter  involve  third  and  higher  order  distributions,  the  conjecture  of 
Julesz  ([42])  notwithstanding.  Finally,  the  model  enjoys  some  invariance  properties, 
with  respect  to  changes  in  illumination. 

There  are  many  applications  for  partitioning  and  boundary  detection.  Tex¬ 
ture  is  a  dominant  feature  in  remotely-sensed  images,  and  regions  cannot  be  distin- 


guished  by  methods  based  solely  on  shading,  such  as  edge  detectors  or  clustering 
algorithms.  Specifically,  for  example,  one  might  wish  to  determine  the  concentra¬ 
tion  of  ice  in  synthetic  aperature  radar  images  of  the  ocean,  or  analyze  multispectral 
satellite  data  for  land  use  classification.  Another  application  is  to  wafer  inspection: 
low  magnification  views  of  memory  arrays  appear  as  highly  structured  textures,  and 
other  geometries  have  a  characteristic,  but  random,  graining.  Many  other  examples 
and  analyses  of  texture  can  be  found  in  [16], [20], [37], [45], [48], [60], [70], [71], [72]. 

Texture  discrimination  can  be  regarded  as  the  detection  of  discontinuities  in 
surface  composition.  We  also  consider  the  problem  of  locating  sudden  changes  in 
depth  (occluding  boundaries)  or  shape  (surface  creases,  etc.).  The  idea  is  to  define 
contours  which  are  faithful  to  the  3-D  scene  but  avoid  the  “non-physical”  edges  due 
to  noise,  digitization,  texture,  lighting,  etc.  Obviously,  there  are  discontinuities, 
such  as  shadows,  which  are  essentially  impossible  to  distinguish  from  the  occluding 
and  shape  boundaries,  at  least  without  information  from  multiple  sensors  or  a  rich 
knowledge  base,  in  which  case  boundary  classification  becomes  possible. 

The  complications  are  well-known:  digital  edges  tend  to  be  very  “noisy”,  due 
in  part  to  the  digitization  process  itself,  but  also  to  de-focusing  and  random  effects 
in  detecting  the  photons.  The  result  is  a  variety  of  pathologies:  “true”  boundaries 
suddenly  disappear,  spurious  ones  appear  haphazardly,  and  in  general  the  surface 
transitions  are  highly  redundant. 

We  formulate  boundary  detection  as  a  single  optimization  problem,  fusing  the 
detection  of  edges  with  their  pruning,  linking,  smoothing,  and  so-on.  The  subject 
of  edge  detection  is  very  active,  and  there  has  been  considerable  progress  of  late 
in  designing  filters  based  on  differential  operators  for  “optimally”  detecting  var¬ 
ious  “ideal”  step,  crease,  and  other  edges  in  noise-corrupted  1-D  and  2-D  signals 
([11], [50], [69]).  Other  methods  detect  edges  after  fitting  smooth  surfaces  to  the  data 
([36], [34], [59]),  and  still  others  ([3], [4], [13], [51], [52], [56], [68])  perform  surface  recon¬ 
struction  and  boundary  detection  at  the  same  time,  and  are  cast  in  a  framework 
similar  to  the  set-up  in  [25]. 

The  use  of  boundary  maps  as  the  input  to  further  processing  is  ubiquitous 
in  computer  vision;  for  example,  algorithms  for  stereopsis,  optical  flow,  and  simple 
object  recognition  are  often  based  on  matching  boundary  segments.  Other  appli¬ 
cations  include  the  analysis  of  medical  images  (e.g.  angiograms  and  ultrasound); 
automated  navigation  [9];  and  the  detection  of  the  paths  of  roads  and  geologic  faults, 
or  the  edges  of  lakes,  flood  plains,  and  crop  fields,  in  remotely- sensed  images. 

Bayesian  Framework.  Most  would  agree  that  a  coherent  theoretical  frame¬ 
work  for  image  analysis  would  support  more  robust  and  more  powerful  algorithms 
for  restoration  and  interpretation.  In  this  work  we  continue  exploring  an  approach 
based  on  Bayesian  image  models,  well-defined  principles  of  inference,  and  a  Monte 
Carlo  computation  theory.  Exploiting  this  framework,  or  Bayesian  paradigm,  we 
have  obtained  encouraging  results  in  several  areas  of  application,  including  im- 


age  restoration  (25]  and  analysis  [33],  computed  tomography  [27],  and  texture  and 
boundary  analysis  [21], [24], [26], [30].  Other  researchers  have  adopted  and  expanded 
this  (and  closely  related)  methodologies.  For  example,  the  application  in  [57]  to 
scene  segmentation  based  on  optical  flow  incorporates  both  temporal  and  global 
interactions  and  a  degradation  model  based  on  sensor  optics  and  other  physical 
principles.  Additional  examples  include  surface  reconstruction  [51], [52],  scene  seg¬ 
mentation  based  on  shading  and  texture  [15], [17],  and  frame- to-frame  matching  for 
computing  optical  flow  and  stereo  disparity  [44].  Similar  models  have  appeared  in 
recent  work  on  neural  networks  [38],  speech  [7],  and  remote  sensing  [46]. 

The  approach  is  Bayesian  because  we  construct  prior  probability  models  for 
both  observed  and  unobserved  scene  attributes.  These  models  express  the  regular¬ 
ities  and  preferred  relations  found  in  most  real  scenes,  such  as  the  unlikeliness  of 
“blind”  endings  to  boundaries,  or  very  small  or  thin  regions,  and  the  likeliness  of 
meaningful  transitions  at  discontinuities  of  various  spatial  statistics.  These  regular¬ 
ities  are  rarely  deterministic;  they  are  best  expressed  as  correlations  and  likelihoods, 
and  we  are  led  to  the  representation  of  our  prior  expectations  by  a  “prior”  proba¬ 
bility  distribution  to  capture  the  tendencies  and  constraints  that  characterize  the 
particular  scene  of  interest.  Inference  can  then  be  guided  by  this  prior  distribution 
together  with  a  model  for  the  degradation ,  which  determines  the  relation  between 
the  image  attributes  and  the  observation,  usually  in  the  form  of  a  conditional  den¬ 
sity  of  the  latter  given  the  former.  If  these  steps  are  well-conceived,  there  are  severe, 
but  appropriate ,  limits  imposed  on  the  plausible  restorations  or  interpretations. 

To  set  the  stage  for  the  applications  to  partitioning  and  boundary  finding,  we 
shall  briefly  review  the  formal  description  of  this  Bayesian  framework,  as  it  may  be 
applied  to  image  processing,  and  make  some  specializations  and  extensions  that  will 
be  needed  later.  This  will  be  self-contained,  but  we  refer  to  [25]  for  more  complete 
discussion. 

We  will  represent  by  x  the  (high  dimensional)  vector  of  relevant  image  at¬ 
tributes,  including,  for  example,  the  digitized  pixel  grey  levels  and  the  zero  or 
one  (off  or  on)  boundary  labels.  The  prior  distribu  tion ,  II,  is  a  probability  for  i: 
o  <  II(x)  <  1  Vx,  Ylz  H(x)  =  1,  where  *s  summation  over  all  configurations  of 
x  (all  assignments  of  grey  levels  and  boundary  placements,  for  example).  We  adopt 
the  Gibbs  representation ,  which  is  to  say  that  we  represent  II  as 

n(x)  =  -exp{-U(x)},  z  =  ^exp{— 17(x)} 

X 

The  real-valued  function  U  is  called  the  energy ,  and  evidently  determines  II.  The 
Gibbs  distribution  describes  the  equilibrium  of  a  physical  system,  suitably  uncon¬ 
strained,  that  has  energy  U  (after  an  appropriate  scaling)  as  a  function  of  the  state, 
i.  The  analogy  suggests  using  U  as  a  vehicle  to  construct  II:  design  an  energy  U 
that  is  “small”  for  those  configurations  that  are  compatible  with  prior  beliefs,  but 


is  “large”  when  these  beliefs  are  violated.  Then  the  likely  states  under  II  will  be 
the  ones  that  meet  with  prior  expectations.  Building  U  is  more  natural,  and  can 
be  much  easier,  then  directly  building  II  (see  [2], [25], [27], [33], [52], [58]  for  explicit 
examples). 

The  object  of  interest  is  x;  we  define  it  to  include  the  relevant  attributes 
for  the  particular  image  processing  task  at  hand  (such  as  the  boundary  labels  or 
the  generic  region  labels  used  herein,  the  texture  classification  labels  used  in  [30], 
isotope  intensities  for  computed  tomography  [27],  or  the  object  classifications  used 
in  [33]).  There  is  a  problem-specific  degradation  that  precludes  directly  observing  x. 
It  may  be  the  blur  and  noise  introduced  in  infrared  imaging,  the  attenuated  Radon 
transform  that  figures  into  emission  tomography,  or  simply  an  occlusion,  as  when 
pixel  grey  levels  are  observed  uncorrupted,  but  the  object  of  interest  is  the  boundary 
placements.  In  the  last  example,  the  data  comprises  only  those  components  of  x  that 
correspond  to  pixel  intensities;  the  actual  boundary  labels  are  of  course  unobserved. 
We  will  denote  the  data  (observations)  by  y.  Its  components  are  usually  pixel  grey 
levels,  but  could  also  be,  for  example,  range  data  from  laser  radar,  or  gamma 
camera  counts  from  an  emission  tomography  machine.  The  details  of  the  imaging 
mechanism  define  the  degradation,  which  we  formally  model  by  specifiying  the 
conditional  distribution  of  y  (the  observation)  given  x  (the  “true”  state):  II(y|x). 

Given  the  prior  (II(x)),  the  observation  model  (II(y|x)),  and  the  data  (y),  the 
posterior  distribution,  II(x|y),  is  derived  by  Bayes’  formula: 


n(x|y) 


n(y|x)II(x) 

£«.  n(t/|x')n(x') 


It  is  useful  to  preserve  the  formal  connection  with  statistical  mechanics,  and  so  we 
write  the  posterior  distribution  in  the  Gibbs  representation: 

n(*|y)  =  \exp{-U(x)} 

Of  course,  the  posterior  energy  U ,  and  the  new  normalizing  constant,  z,  may  both 
depend  on  y,  but  this  is  fixed  by  observation. 

The  goal  is  to  estimate  x,  which  may  correspond  to  restoring  a  blurred  and 
noise  corrupted  picture,  placing  boundaries,  classifying  textures,  or  perhaps  label¬ 
ing  objects,  depending  on  the  task  at  hand.  Mostly,  we  have  worked  with  two 
estimators,  the  maximum  a  posteriori  (MAP)  estimator  and  the  posterior  mean. 
The  MAP  estimator  is  any  mode  of  the  posterior  distribution: 

x  =  arg  max  II(x|y), 

which  is  the  Bayes  estimator  corresponding  to  the  zero-one  loss  function 


if  x  =  x 
otherwise 
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On  the  other  hand,  the  posterior  mean 


x  =  ^xll(xjy) 

X 

minimizes  the  mean  squared  error,  corresponding  to  the  loss  function  L(x,x)  = 


The  appropriate  loss  function  is  necessarily  problem-specific.  For  tomography 
[27]  we  mostly  use  the  posterior  mean,  although  simulations  from  the  posterior 
distribution  are  also  very  informative.  For  the  boundary  and  generic  region  labels 
discussed  here  we  find  MAP  most  appropriate. 

There  is  skepticism  about  the  MAP  estimator:  see  e.g.  [2], [19], [52].  It  has 
sometimes  been  found  to  be  too  “global”,  leading  to  gross  mislabeling  in  certain 
classification  problems  and  “over-smoothing”  in  surface  reconstruction  and  image 
restoration.  (See  [10]  for  a  different  view.)  The  discussion  paper  of  Besag  [2]  has 
shed  much  light  on  the  subject;  see  especially  the  remarks  of  Silverman  [65]  on  MAP 
vs.  simulations  from  the  posterior  II(x|y),  and  the  remarkable  comparisons  between 
the  exact  MAP  estimate  and  approximations  derived  from  simulated  annealing  in 
the  commentary  of  Greig,  Porteous,  and  Seheult  [32].  However,  pixel-based  error 
measures  are  too  local  for  boundary  analysis.  In  particular,  the  Bayes  rule  based 
on  misclassification  error  rate,  namely  the  marginal  (individual  component)  modes 
of  II(x|y),  is  unsuitable  because  this  estimator  lacks  the  fine  structure  we  expect  of 
boundary  maps;  placement  decisions  cannot  be  based  on  the  data  alone  -  pending 
labels  (i.e.  context)  must  be  considered.  See  [52], [63],  and  [73]  for  discussions  of 
alternative  loss  functions  and  performance  criteria. 

Actually  computing  samples,  means,  and  modes  is  usually  impossible,  at  least 
with  today’s  hardware.  For  approximations,  we  use  a  variation  of  the  Metropolis 
algorithm  [53]  that  we  call  stochastic  relaxation  (SR).  This  is  a  highly  parallel 
Monte  Carlo  algorithm  that  loosely  simulates  the  approach  to  equilibrium  of  an 
imagined  system  with  energy  U.  Later,  we  will  have  more  to  say  about  SR  and 
certain  extensions,  and  a  full  account  cam  be  found  in  [23]  and  [25].  For  now,  suffice 
it  to  say  that,  asymptotically  at  least,  SR  can  be  used  to  sample  from  the  posterior, 
or  to  compute  its  mean  and  mode. 

We  have  found  it  convenient,  especially  when  working  with  boundaries  and 
partitionings,  to  extend  this  framework  by  allowing  “infinite  energies”  (zero  prob- 
abilites)  in  the  prior  distribution.  (See  Moussouris  [55]  for  an  analysis  of  Gibbs 
measures  with  “forbidden”  states.)  Rather  than  inhibiting,  by  high  energy,  “blind 
boundary  endings  and  redundant  boundary  representations,  or  a  partitioning  into 
excessively  small  or  thin  regions,  we  simply  disallow,  or  forbid,  these  configurations. 
Later,  we  will  define  a  function  V(x)  that  essentially  counts  the  number  of  subcon¬ 
figurations  among  the  labels  that  are  forbidden.  More  generally,  we  let  V  (z)  be 


nonnegative  and  consider  the  Gibbs  prior  on  the  (allowed)  set  {x  :  V'(x)  =  0): 


1  -  K 
n(x)  =  -6{v=o)(x)exp{-U(x)},  z  =  X,  S{v=0}(x)exp{-U(x)} 

X 

Whatever  the  degradation  model,  the  posterior  distribution  will  be  similarly  re¬ 
stricted,  and  of  the  form 

n(x|j/)  =  rS{v=o}(x)exp{-U(x)},  z  =  '£s{v=0](x)exp{-U(x)} 


The  constraint,  V(x)  =  0,  amounts  to  a  placement  of  infinite  energy  barriers 
in  the  “energy  landscape”.  These  inhibit  the  free  flow  that  is  essential  to  the  good 
performance  of  SR;  indeed,  the  theory  will  in  general  break  down,  and  convergence 
is  no  longer  guaranteed.  A  simple  and  effective  solution  is  to  introduce  these  barriers 
gradually  during  the  relaxation  process.  This  will  be  made  precise  in  §4,  with  the 
supporting  convergence  theory,  which  is  quite  straightforward,  layed  out  in  [23]. 

We  now  specialize  to  the  partitioning  and  boundary  placement  applications,  in 
which  the  relevant  attributes  are  pixel  grey  levels  and  labels,  the  latter  either  repre¬ 
senting  boundary  elements  or  regions.  To  make  this  explicit  we  write  x  =  (xL,xp), 
where  xL  is  the  vector  of  boundary  or  region  labels,  and  xp  is  the  vector  of  pixel 
grey  levels.  .  Two  rather  different  kinds  of  considerations  will  go  into  constructing  the 
prior.  These  will  be  discussed  in  detail  shortly,  but  the  upshot  is  that  we  separate 
the  prior  energy  into  a  pixel-label  interaction  term  and  a  pure  label  contribution. 
The  former,  U(xL  ,xp),  promotes  placements  of  boundaries,  or  assignments  of  dis¬ 
tinct  labels,  between  regions  in  the  image  that  demonstrate  distinct  spatial  patterns. 
The  pure  label  contribution  is  to  inhibit  “blind”  endings  of  boundaries,  redundant 
boundary  representations,  small  regions,  and  other  unexpected  label  configurations. 
As  discussed  previously,  the  simplest  way  to  avoid  these  unwanted  configurations  is 
to  forbid  them  by  introducing  V  =  V{xL )  and  concentrating  on  {x  :  V(xL)  =  0}. 
The  prior,  then,  is  of  the  form 

II(z)  =  j S{v=o}(xL)exp{-U(xL,xp )} 

As  for  the  degradation,  in  this  paper  we  shall  concentrate  on  the  common 
situation  in  which  our  observations  of  the  pixel  grey  levels  are  essentially  uncor¬ 
rupted:  y  =  xp.  There  is  no  significant  blur  or  noise,  and  hence  no  need  for  grey 
level  restoration.  Our  only  interest  is  in  estimating  the  unobserved  label  process 
xL .  Tl(yjx)  is  singular,  and  the  posterior  reduces  to 


nWy)  =  -:t>[zr=y](xP)6{v=0)(xL)exp{-U(xL,xP)}, 
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x~  52  6{:p=y}(xP)S{V=o}(xL)exp{-U(xL,xP)} 

ZP,!*- 

Since  xp  =  y  is  fixed  by  observation,  we  equivalently  treat 

n(zL)y)  =  -  8(v=o){xL)exp{-U(xL  ,y)}, 

*  -  Yl6{v=o}(xL)exp{-u(.xL ,y)} 

X L 

derived  from  the  prior 

K{xL,y)  =  ±SlVm0  )(xL)txp{-U(xL,y)}, 

Z=J2  6{V=o)(xL)exp{-U(xL,y)}. 

xL,y 

The  superscript  L  is  now  superfluous,  and  we  will  henceforth  simply  use  x  when 
referring  to  the  label  process. 

Label  Model:  General  Form.  Let  x  =  {x4,s  6  S}  and  y  =  {yij,  l  <  ij  < 
Nj  denote,  respectively,  the  labels  and  the  data;  thus  x3  is  the  label  at  “site”  s  £  5 
and  and  y{j  is  the  grey-level  at  pixel  (ij).  The  set  5  of  label  sites  is  a  regular 
lattice,  distinct  from  that  of  the  pixels,  and  typically  more  sparse;  the  coarseness 
depends  on  the  label  resolution  a.  For  partitioning,  we  associate  each  site  s  £  S  with 
a  block  of  pixels,  “sitting  below  it,”  if  we  were  to  stack  the  label  lattice  on  top  of 
the  pixel  lattice.  In  the  boundary  model,  pairs  of  nearby  sites  in  5  define  boundary 
segments,  and  these  are  associated  with  pairs  of  pixel  blocks,  sitting  “across  from 
each  other,  with  respect  to  the  segments  (see  Figure  6).  Later,  we  will  define  a 
neighborhood  system  for  S  such  that  the  bonding  is  nearest-neighbor  (relative  to  a) 
in  the  boundary  model,  \vhereas  in  the  region  model  there  are  interactions  at  all 
scales.  This  has  important  consequences  for  the  distribution  of  local  minima  in  the 
energy  landscape”;  see  §2.  Other  energy  functionals  with  global  interactions  rap 
be  found  in  [27], [31],  and  [57]. 

The  “interaction”  between  x  and  y  is  defined  in  terms  of  an  energy  function 
u(x,y)=  ^  *s,t(x)$s,t(y) 

<3,t> 

The  summation  extends  over  all  “neighboring  pairs”  (or  “bonds”)  <  s.t  >.  >. :  - 
$j,t(y)  is  a  measure  of  the  disparity  between  the  two  blocks  of  pixel  data  associated 
with  the  label  sites  s,t  6  S.  'f,it(x)  depends  only  on  the  labels  x,  and  r,.  In  fact, 
we  simply  take  ^J>t( x)  =  1  -  x,xf  in  the  boundary  model  and  =  <-r.  =  rt  in 

the  partition  model.  In  this  way,  in  the  “low  energy  states”,  large  disparities  (<I>  >  0) 
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will  typically  be  coupled  with  an  active  boundary  (x,  =  xt  =  1)  or  dissimilar  region 
labels  (x,  /  xt)  and  small  disparities  ($  <  0)  will  be  coupled  with  an  inactive 
boundary  (x,  =  0  or  x(  =  0)  or  equal  region  labels  (x4  =  xt). 

The  interaction  between  the  labels  and  the  data  is  based  on  various  dispar¬ 
ity  measures  for  comparing  two  (possibly  distant)  blocks  of  image  data.  These 
measures  derive  from  the  raw  data  as  well  as  from  various  transformations.  We 
experiment  with  several  transformations  y  — +  y'  for  texture  analysis,  for  example 
transformations  of  the  form 

(L1)  y't  =  \yt~Y^  “jV**  I 

where  =  1  and  {ty}  are  pixels  nearby  to  pixel  t  and  in  the  same  row,  column, 
or  diagonal.  We  call  these  “directional  residuals”,  regarding  ^ a: y( j  as  a  “predic¬ 
tor”  of  yt.  We  have  also  experimented  with  a  variety  of  transforms  suggested  in 
[37].  Disparity  is  measured  by  the  Kolmogorov- Smirnov  statistic  (or  distance),  a 
common  tool  in  nonparametric  statistics  which  has  desirable  invariance  properties. 
(In  particular,  using  the  directional  residuals,  the  disparity  measure  is  invariant  to 
linear  distortions  (y,y  — ►  ay,y  +  6)  of  the  raw  data,  and  using  the  raw  data  itself  for 
comparisons,  the  disparity  measure  is  invariant  to  all  monotone  (data)  transforma¬ 
tions.)  The  general  form  of  the  $  term  is  then 

where  <f>  is  monotone  increasing,  p  denotes  a  distance  based  on  the  Kolmogorov- 
Smirnov  statistic  (see  §2),  and  y^.j /[jj  are  the  data  in  the  two  blocks  associated 
with  <  s,  t  >  for  the  ith  transform.  Often,  we  simply  take  m  =  1  and  y (D  =  /. 

Apparently  some  of  these  ideas  have  been  kicking  around  for  a  while.  For 
example,  the  Kolmogorov-Smirnov  statistic  is  recommended  in  [64],  and  reference 
is  made  to  still  earlier  papers;  more  recently,  see  [72].  Moreover,  the  distributional 
properties  of  residuals  (from  surface-fitting)  are  advocated  in  [34], [61]  for  detecting 
discontinuities.  It  is  certainly  our  contention  that  the  statistical  warehouse  is  full 
of  useful  tools  for  computer  vision. 

The  other  component  in  the  model  is  a  penalty  function  V(x)  which  counts 
the  number  of  “taboo  patterns”  in  x;  states  x  for  which  V(x)  >  0  are  “forbidden”. 
For  example,  boundary  maps  are  penalized  for  dead-ends,  “clutter”,  density,  etc. 
whereas  partitions  are  penalized  for  too  many  transitions  or  regions  which  are  “too 
small”. 

Given  the  observed  image  y,  the  MAP  estimate  x  =  x(y)  is  then  any  solution 
to  the  constrained  optimization  problem 

(1.2)  minimizer.v/(l)=0C/(x,  y) 


We  seek  to  minimize  the  energy  of  the  data-label  interaction  over  all  possible  non- 
for bidden  label  states  x. 

The  rationale  for  constrained  optimization  is  that  our  expectations  about  cer¬ 
tain  types  of  labels  are  quite  precise  and  rigid.  For  example,  most  “physical  bound¬ 
aries”  are  smooth,  persistent,  and  well-localized;  consequently  it  is  reasonable  to 
impose  these  assumptions  on  image  boundaries,  and  corresponding  restrictions  on 
partition  geometries.  Contrast  this  with  other  inference  problems,  for  example 
restoring  an  image  degraded  by  blur  and  noise.  Aside  from  constraints  derived 
from  scene-specific  knowledge,  the  only  reasonable  generic  constraints  might  be 
“piecewise  continuity”,  and  generally  the  degree  of  ambiguity  favors  more  flexible 
constraints,  such  as  those  in  U,  or  in  the  energy  functions  used  in  [25]  and  [27]. 

As  mentioned  earlier,  the  search  for  x  is  by  a  version  of  stochastic  relaxation 
which  incorporates  rigid  constraints.  The  theoretical  foundations  are  laid  out  in 
[23],  although  there  is  enough  information  provided  here  to  keep  this  paper  self- 
contained;  see  §4.  Basically,  we  simulate  annealing  ([12],  [47])  by  introducing  a 
control  parameter  t  corresponding  to  “temperature”,  and  another  control  param¬ 
eter  A,  corresponding  to  a  Lagrange  multiplier  for  the  constraint  V  =  0.  More 
specifically,  let 

Uk{x)  =  t~l[U{x,  y)  -I-  XkV{x)\ 

where  y  (the  data)  is  fixed,  tk  \  0,  and  A*  /  oo.  The  algorithm  generates  a 
sequence  of  states  it,  k  =  1, 2, . . .,  by  Monte  Carlo  sampling  from  the  local  condi¬ 
tional  distributions  of  the  Gibbs  measures  with  energy  functions  Uk.  Under  suitable 
conditions  (see  §4),  the  sequence  it  “converges”  to  a  solution  of  (1.2). 

The  algorithm  is  computationally  demanding  but  has  the  same  potential  for 
parallel  implementation  as  standard  stochastic  lelaxation.  The  experiments  here 
were  performed  on  serial  machines  but  required  considerably  less  processing  time 
than  those  in  [25],  for  example,  due  to  lower  resolution  labels,  departures  from 
the  “correct”  annealing  schedules,  and  deterministic  approximations  akin  to  those 
in  [2],  [IS],  and  [27].  A  “fast  annealing”  algorithm  is  reported  in  [67].  In  any 
event,  too  much  fuss  over  CPU  times  may  be  ill-advised.  Software  engineers  know 
that  it  is  often  possible  to  achieve  order-of-magnitude  speed-ups  by  some  modest 
reworkings  and  compromises  when  dedicating  a  general  purpose  algorithm  to  a 
specific  task,  and  this  has  certainly  been  our  experience.  Besides,  advances  in 
hardware  are  systematically  underestimated.  It  is  reported  in  [58]  that  experiments 
in  [25]  requiring  several  hours  of  VAX  11/780  time  were  reproduced  in  less  than 
one  minute  on  the  ICL  DAP,  and  the  authors  speculate  about  real-time  stochastic 
relaxation. 

There  are  no  multiplicative  parameters  in  the  model,  such  as  the  “smoothing” 
or  “weighting”  parameters  in  [3], [21], [24], [25],  and  [27];  in  effect,  the  energy  is  U  + 
XV  with  A  =  oo.  Thresholds  must  be  selected  for  the  disparity  measures,  but 
much  of  this  can  be  data-driven  (see  §5),  and  fortunately  the  performance  is  not 


unduly  sensitive  to  these  choices  within  some  range.  Other  inputs  include  the 
label  resolution,  block  sizes,  and  penalty  patterns.  The  model  is  robust  against 
these  choices  as  well,  so  long  as  modest  information  about  the  pixel  resolution  is 
available. 

There  are  close  ties  with  “regularization  theory”  for  “computational  vision” 
([52], [62])  and  even  closer  ones  with  “variational”  approaches,  such  as  those  of 
Mumford  and  Shah  [56],  Blake  [3],  Blake  and  Zisserman  [4],  and  Terzopoulos  [6£], 
which  incorporate  discontinuity  constraints  and  penalties.  Perhaps  the  main  dif¬ 
ference  is  the  separation  of  the  energy  components  into  terms  corresponding  to  the 
prior  and  degradation  models;  in  particular,  we  regard  the  energy  function  as  the 
(negative)  log-likelihood  of  the  posterior  distribution  and  our  optimization  proce¬ 
dures  are  strongly  motivated  by  this  viewpoint.  Stochastic  relaxation  permits  us 
to  analyze  the  posterior  distribution,  revealing  its  likely  and  unlikely  states.  For 
instance,  the  posterior  mean,  E(x\y),  is  not  a  property  of  the  energy  per  se,  but 
is  an  excellent  estimate  in  some  cases  ([27]).  And,  identifying  the  “regularization 
term”  as  the  (negative)  log-likelihood  of  the  prior  provides  a  statistical  framework 
for  estimating  the  regularization  parameter  ([27]),  as  well  as  other  parameters  in 
the  model  ([26], [22], [74]). 

Sources  of  Information.  All  information  bearing  on  the  labeling  is  encoded 
in  the  posterior  distribution,  or,  equivalently,  in  the  (posterior)  energy  function 
and  constraints.  There  is  no  pre-  or  post-processing.  The  final  estimate  x  =  x(y) 
is  totally  a  function  of  the  model  and  the  data.  In  particular,  if  the  energy  does 
not  account  for  any  global  image  attributes,  e.g.  templates  or  semantical  variables, 
then  there  is  no  “top-down”  or  “goal-directed”  component  to  the  search  process. 
Such  is  the  case  in  this  work;  we  are  currently  investigating  the  capacity  of  this 
methodology  for  integrating  “high-level”  information. 

On  the  other  hand,  Markov  random  field  (equivalently,  Gibbs)  priors  have 
proven  well-suited  to  cooperative  processing.  For  example,  several  tasks  can  be 
effectively  linked,  such  as  simultaneous  surface  interpolation  and  boundary-finding 
[51], [52],  or  simultaneous  filtering  and  deconvolution  [25].  (See  also  [57].)  More 
to  the  point,  a  single,  complex  task  may  involve  a  number  of  sub-procedures.  For 
example,  boundary  detection  involves  seeding,  organization,  and  smoothing.  These 
sub-  procedures  are  usually  performed  sequentially;  here  they  are  fully  coupled. 

As  we  have  already  mentioned,  we  consider  only  one  data  source,  a  single 
frame  of  visible  light  or  L-band  synthetic  aperature  radar.  It  would  be  desirable 
to  incorporate  data  from  motion,  multiple  views,  or  multiple  sensors,  and  we  are 
currently  studying  an  expanded  version  of  these  models  utilizing  both  optical  and 
range  data  for  boundary  classification  and  other  applications.  See  §6  for  additional 
remarks  about  generalizations  of  the  model  and  [54]  for  a  thoughtful  discussion 
about  multivariate  data. 


2.  PARTITION  MODEL 


Partitionings.  Denote  the  pixel  (image)  lattice  {(i,  j)  :  1  <  i,j  <  N }  by  Sj 
and  let  Si  (formally  5  in  §1)  be  the  label  lattice,  just  a  copy  of  Si  in  the  case  of  the 
partition  model.  For  each  experiment,  a  resolution  <j  is  chosen,  which  determines 
a  sub-lattice  S ^  C  Si,  and  the  coarseness  of  the  partitioning.  Larger  c’s  will 
correspond  to  coarser  partitionings  and  give  more  reliable  results  (see  §5),  but  they 
lose  boundary  detail.  Specifically,  let 

S(L]  =  {(*'*  +  1  J<T  +  1)  :  0  <  i,j  <  — — -}. 

<7 

Recall  that  the  observation  process,  or  data,  consists  of  grey  levels  y3 ,  s  6  S/.  With 
the  usual  grey-level  discretization,  the  state,  or  configuration,  space  for  the  data  is 

ft/  =  {{y,}  ••  se  S/,0  <y3  <  255} 

The  configuration  space  for  the  partitioning,  x,  is  determined  by  a,  and  by  a  max¬ 
imum  number  of  allowed  icgions,  P: 

n{L'P)  =  {{xs}  :  s  6  5^,0  <  x4  <  P  -  1}. 

Recall  that  the  labels  are  generic:  x  defines  a  partitioning  by  identifying  sites  with 
a  given  label  (0,1,. ..,P  —  1)  as  belonging  to  the  same  region.  Only  the  sub-lattice 
S[a)  is  labelled,  and  a  maximum  number  of  labels  (regions)  is  fixed  a  priori.  A 
prior  estimate  of  the  number  of  distinct  (but  not  necessarily  connected)  regions 
must  be  available,  since  the  model  often  subdivides  homogeneous  regions  when  P 
is  too  large  (see  §5).  The  boundary  model  (§3)  is  more  robust  in  this  regard. 

Each  label  site  s  £  Si  is  associated  with  a  square  block  Ds  C  5/  of  pixel 
sites  centered  at  s.  (Recall  that  Si  is  just  a  copy  of  Sr,  we  sometimes  use  “s” 
ambiguously  to  reference  a  site  in  Sl  and  the  corresponding  site  in  5/.)  x,  labels 
the  pixels  in  D,  :  {{yr}  :  r  G  D,}.  As  we  will  see  shortly,  the  partitioning  is  based  on 
the  spatial  statistics  of  these  (overlapping)  sub-images.  The  size  of  Ds  is  therefore 
important.  We  have  experimented  only  with  textures  (the  boundary  model  has  been 
applied  more  generally),  and  it  is  obvious  that  for  these  the  pixel  blocks  {D,}  co<o 

must  be  large  enough  to  capture  the  characteristic  pattern  of  the  texture,  at  least  in 
comparison  to  the  other  textures  present.  Of  course,  “large  enough”  is  with  respect 
to  the  features  used,  but  in  the  absence  of  a  multiscale  analysis,  an  a  priori  choice 
of  scale  is  unavoidable.  In  all  of  our  experiments,  |Da|  =  441,  a  21  x  21  square  block 
of  pixels.  There  is  again  a  resolution  issue:  larger  blocks  characterize  the  textures 
more  reliably,  having  less  within-region  variation,  but  boundary  detail  is  sacrificed. 

Label-Data  Interaction.  We  establish  a  neighborhood  system  on  the  label 
lattice  S^7':  each  s  £  5 ^  is  associated  with  a  set  of  neighbors  N3  C  5’^°'.  The 


i 
I 

I 

system  is  symmetric,  meaning  that  s  6  Nr  <=*  r  G  Ns.  As  we  shall  see,  the  neighbor-  ! 

hood  system  largely  determines  the  computational  burden.  For  now  we  will  proceed 
as  though  the  neighborhood  system  is  given,  but  we  will  have  much  more  to  say 
about  it  shortly. 

Let  <  s,t  >a  denote  a  neighbor  pair,  meaning  s,  t  G  s  €  Nt.  We 

will  introduce  a  disparity  measure  =  $s,t(y)  for  each  neighbor  pair  <  s,t  >„. 

F.oughly  speaking,  <$3)<  measures  the  similarity  between  the  pixel  grey  levels  in  the 

two  pixel  blocks  associate^  with  $,t  G  S^\  For  the  partition  model,  is  simply 
-1  (“similar”)  or  -fl  (“dissimilar”);  it  is  more  complicated  for  the  boundary  model. 

The  interaction  energy  is  then 

U(x,y)=  Y  S{r.=z ,}*s.t(y) 

o,t>. 

In  the  low  energy  states,  similar  (resp.  dissimilar)  pairs,  r^,)t  =  —1  (resp.  = 

+  1),  are  associated  with  identical  (resp.  distinct)  labels:  xa  =  xt  (resp.  xa  ^  xt ). 

Although  U(x,  y )  is  conceived  of  as  the  interaction  term  in  a  prior  distribution  that 
is  jointly  on  x  and  y,  only  the  posterior  distribution  is  actually  used,  and  y  is  fixed 
by  observation.  It  would  be  interesting,  and  perhaps  instructive  (see  [43]),  to  sample 
from  the  joint  distribution,  but  computationally  very  expensive. 

Neighborhood  System.  A  simple  example  will  serve  to  highlight  the  issues. 

Suppose  y  has  R  constant  grey-level  (untextured)  regions  (y3  G  {0, 1,  ...R  —  1},  s  G 
Si),  and  a  —  l  (full  resolution).  Of  course,  in  this  case  y  is  a  labelling,  so  there  is  no 
point  in  bringing  in  the  partition  process  x;  but  this  is  just  an  illustrative  example. 

The  obvious  disparity  measure  is  simply  <k3jl  =  —1  if  y3  =  yt,  and  +1  otherwise: 

(2-1)  U(x,y)=  6{:.=*,](S{y.*yt)  —  ^{y.=y, })- 

<*,«>! 

Entertain,  for  the  time  being,  a  nearest  neighbor  system  on  Sl,  which  is  the  natural 
choice.  To  be  concrete,  take  N,  to  be  the  four  (two  horizontal  and  two  vertical) 
nearest  neighbors  of  s.  There  are  three  essential  difficulties  with  this  choice  of 
neighborhood  system.  Two  can  be  readily  appreciated: 

•  (See  Figure  1.)  If  R  =  2  and  P  =  3,  and  if  region  “0”  (i.e.  {s  G  5/  :  y,  =  0}) 
is  split  into  two  disjoint  pieces  by  region  “1”  (i.e.  {3  G  5/  :  y3  =  1}),  then 
(2.1)  has  two  kinds  of  global  minima:  correct  labellings,  in  which  there  are  two 
populations  of  labels  corresponding  to  the  two  grey-level  regions;  and  spurious 
labellings,  in  which  the  three  regions  (two  of  type  “0”  and  one  of  type  “1”)  are 
given  three  distinct  labels. 

•  (See  Figure  2.)  If  R  =  3,  and  region  “0”  does  not  neighbor  region  “2”,  then 
there  are  again  two  kinds  of  global  minima:  correct  labellings  have  three  labels; 
spurious  labellings  have  only  two,  incorrectly  identifying  regions  “0”  and  “2”. 
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FIGURE  1 
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0  0  0  1  1  0  0 
0  0  0  1  1  0  0 
0  0  1110  0 
0  0  1  1  0  0  0 
0  1  1  0  0  0  0 
0  1  1  0  0  0  0 
1  1  1  0  0  0  0 


Left:  original  “image”  and  a  correct  labelling 
Middle:  a  correct  labelling 
Right:  spurious  labelling 

FIGURE  2 

Quite  obviously,  the  model  requires  more  global  interactions.  In  particular, 
just  a  few  long  range  interactions  would  disambiguate  the  correct  from  the  spurious 
labellings.  Only  a  correct  labelling  would  achieve  the  global  minimum  of  U  in  these 
two  examples. 


The  third  difficulty  with  local  neighborhoods  is  computational,  and  is  already 
apparent  when  R  =  1  and  P  =  2.  This  time  there  are  only  two  global  minima, 
and  each  is  a  desirable  labelling  ({i,  =  0  Vs  €  Sl}  or  {r,  =  1  Vs  6  Si,}).  But, 
with  iV  =  512,  for  example,  consider  the  label  configuration  in  which  r,;-  =  0 
whenever  1  <  z  <  256  and  x,y  =  1  whenever  257  <  i  <  512,  a  half  “black”  and 
half  “white”  picture.  This  is  a  local  minimum,  and  rather  severe  in  that  it  would 
take  very  many  “uphill”  or  “flat”  moves  (single  site  changes)  to  arrive  at  either 
of  the  global  minima.  SR  is  a  local  relaxation  algorithm,  and  despite  the  various 
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convergence  theorems,  the  practical  fact  of  the  matter  is  that  “wide”  local  minima 
such  as  these  sire  impossible  to  cope  with.  (But,  there  are  some  encouraging  results 
in  the  direction  of  multiscale  relaxation,  see  [29], [6], [66].)  Many  readers  will  be 
reminded  of  the  Ising  model,  in  the  absence  of  an  external  field,  and  the  notorious 
difficulty  of  finding  its  (two)  global  minima  by  Monte  Carlo  relaxation.  In  fact, 
the  R  =  1,  P  =  2  energy  landscape  is  identical  to  that  of  the  Ising  model,  as  is 
readily  demonstrated  by  a  suitable  transformation  of  the  label  variables.  (Indeed, 
the  same  goes  for  the  R  =  2,  P  =  2  case,  although  this  is  less  obvious.  A  suitable 
transformation  identifies  the  two  Ising  minima  with  the  two  acceptable  labellings: 
x$  =  0  ys  =  0  and  x$  =  1  &  y,  =  0.) 

These  local  minima  can  be  mostly  eliminated  by  introducing  long  range  inter¬ 
actions  in  the  label  lattice  S the  same  remedy  as  for  the  label  ambiguities.  We 
will  provide  a  heuristic  argument  for  the  important  role  of  long  range  interactions 
in  creating  a  favorable  energy  landscape.  In  any  case,  simulations  firmly  establish 
their  utility.  First  recall  that  the  distance  between  two  sites  in  a  graph  is  the  small¬ 
est  number  of  edges  that  must  be  crossed  in  travelling  from  one  site  to  the  other. 
Notice  that  in  the  four  nearest  neighbor  graph  (two  dimensional  lattice)  the  average 
distance  between  sites  is  large.  Correct  partitioning  requires  all  pairs  of  label  sites 
to  resolve  their  relationships  (“same”  or  “different”),  as  dictated  by  the  statistics 
of  their  associated  pixel  blocks.  Of  course  most  pairs  axe  not  neighbors.  With  a 
local  relaxation,  such  as  SR,  the  resolution  is  achieved  by  propagating  relationships 
through  intervening  sites.  Thus  the  task  is  facilitated  by  minimizing  the  number 
of  intervening  sites,  and  a  relatively  small  number  of  long  range  connections  can 
drastically  reduce  the  typical  number  of  these. 

The  largest  distance  over  all  pairs  of  sites  is  the  diameter  of  a  graph.  In 
an  appropriate  limiting  (large  graph)  sense,  random  graphs  have  minimum  diam¬ 
eter  among  all  graphs  of  fixed  degree  1 .  In  light  of  our  heuristics,  this  suggests 
a  random  neighborhood  system  for  Indeed,  random  neighborhoods  have  a 

remarkable  effect  on  the  structure  of  local  minima  for  these  systems.  In  a  series 
of  experiments,  with  “perfect”  disparity  data  (such  as  the  o  —  1  grey-level  prob¬ 
lems  discussed  above)  we  could  always  achieve  the  global  minimum  by  single-site 
iterative  improvement  when  adopting  a  random  graph  neighborhood  configuration, 
using  rather  modest  degrees  for  large  graphs.  We  conjecture,  but  have  been  unable 
to  prove,  that  even  with  the  degree  a  vanishingly  small  fraction  of  the  graph  size, 
random  graphs  (in  the  “large  graph  limit”)  have  no  local  minima,  under  the  Ising 
potential  or  the  potential  U(x,y)  with  perfect  disparity  data  (2.1). 

Of  course  the  disparity  data  is  not  usually  perfect.  In  challenging  texture 
discrimination  tasks  there  will  be  pixel  blocks  from  the  same  texture  that  are  mea¬ 
sured  as  dissimilar  ($Jit  =  1)  and  others  from  distinct  textures  that  are  measured  as 

1  A  graph  has  fixed  degree  if  each  site  has  the  same  number  of  neighbors.  The 
degree  is  then  the  number  of  neighbors  per  site. 


similar  (<$Jit  =  -1).  Under  these  circumstances  it  helps  to  also  have  near  neighbor 
interactions,  since  these  tend  to  bond  neighboring  label  sites  and  thereby  increase 
the  effective  number  of  long  range  interactions  per  site.  Although  near  neighbors 
were  not  always  needed  to  get  the  best  results,  we  settled  on  using  four  near  neigh¬ 
bors,  and  sixteen  random  neighbors  per  site,  in  each  of  our  experiments  (see  §5). 
By  “near  neighbors”  we  mean  the  closest  two  horizontal  and  two  vertical  neighbors 
whose  associated  pixel  blocks  do  not  overlap.  For  example,  with  a  —  7,  and  using 
21  x  21  blocks,  the  near  neighbors  have  two  intervening  sites  in  .  Details  on 
the  generation  of  the  (pseudo)  random  neighbors  can  be  found  in  [30].  Overall, 
perhaps  the  most  effective  neighborhood  system  would  have  a  gradual  fall-off  of 
interaction  densities  with  distance,  a  system  with  an  equal  number  of  neighbors  at 
each  Manhattan  distance,  for  example. 

Kolmogorov-Smirnov  Statistic.  At  the  heart  of  the  partitioning  and 
boundary  algorithms  is  a  disparity  measure  $s,t-  Recall  that  if  s,  t  are  neighbors  in 
(<  s,  t  >„)  then  <£j,t  is  a  measure  of  disparity  between  two  corresponding  blocks 
of  pixel  data,  {{yr}  :  r  6  Da}  and  {{yr}  :  r  6  Dt)  in  the  case  of  the  partition  model. 
We  base  Jtl  on  the  Kolmogorov-Smirnov  distance,  a  measure  of  separation  between 
two  probability  distributions,  well-known  in  statistics.  When  applied  to  the  sample 
distributions  (i.e.  histograms)  for  two  sets  of  data,  say  t/U)  =  v^\ . . .  ,VnV} 

and  v ^  =  {Uj2\vj2\ . . . ,VnV)  it  provides  a  test  statistics  for  the  hypothesis  that 
yU)  and  VW  are  samples  from  the  same  underlying  probability  distribution,  mean¬ 
ing  that  F\  —  F2  where,  for  i  =  1,2,  v ^  3X6  independent  and 

identically  distributed  with  F,(t)  =  P(v^  <  t ).  The  test  is  designed  for  continuous 
distributions  and  has  a  powerful  invariance  property  which  will  be  discussed  below. 

The  sample  distribution  function  of  a  data  set  {uj,  u2, . . . ,  v„}  is 

F(t)  =  —  :  Ufc  <  t},  —00  <  t  <  +00 

n 

Thus,  F  is  a  step  function,  with  jumps  occurring  at  the  points  {vt}.  It  characterizes 
the  histogram.  Now  consider  two  sets  of  data  uU)jV(2)  with  sample  distribution 
functions  Fi,F2.  The  Kolmogorov-Smirnov  distance  (or  statistic)  is  the  maximum 
(vertical)  distance  between  the  graphs  of  Fi ,  F2 ,  i.e. 

(2.2)  d(v(1\v(2>)  =  max  |F,(<)  -  F2(t)l 

—  oo<(<  +  oo 

We  write  d(v^\ v ^)  to  emphasize  the  data  (which  in  our  case  consists  of  blocks  of 
possibly  transformed  pixel  intensity  values);  the  conventional  notation  is  d(FltF2). 

The  invariance  property  is  the  following.  Suppose  are  samples  from 

continuous  distributions  Fj,F2.  Then  under  the  (“homogeneity”)  hypothesis  Fj  = 
F2,  the  probability  distribution  of  d  (as  a  random  variable)  is  independent  of  the 
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(common)  underlying  distribution.  Basically,  this  stems  from  the  fact  that  d  is 
invariant  to  strictly  monotone  transformations  of  the  data,  i.e. 

where  q ^  =  r(vj**)  and  r  is  strictly  increasing  or  decreasing.  Thus,  in  two  sample 
tests  for  homogeneity,  one  rejects  the  null  hypothesis  that  F\  =  F2  if  d(v^\  v^)  > 
dm ,  where  dm  depends  only  on  ni  and  ri2,  °-nd  on  the  significance  level  of  the  test. 

For  our  purposes,  the  data  v ^  and  consist  of  either  the  (raw)  grey  levels, 
or  (in  most  cases  of  texture  discrimination)  transformations  of  these,  restricted  to 
blocks  of  pixels;  these  blocks  axe  adjacent  in  the  boundary  model,  but  may  be  well 
separated  in  the  partition  model  (recall  that  we  employ  a  largely  random  topology). 
In  either  case,  the  assumptions  made  in  statistical  testing  are  generally  violated:  it 
may  be  unreasonable  to  assume  that  the  grey  levels  in  a  block  of  pixels  represent 
independent  and  identically  distributed  observations  from  some  underlying  proba¬ 
bility  distribution  (although  this  is  occasionally  done).  Of  course,  the  size  of  the 
blocks  relative  to  the  image  structures  is  very  important.  The  blocks  may  contain 
hundreds  of  pxiels,  but  11  they  are  still  small  relative  to  the  image  structures,  then 
the  formal  assumption  will  be  more  nearly  satisfied.  At  any  rate,  the  formal  theory 
is  primarily  motivational.  The  distance  (2.2)  is  an  effective  “measure  of  homogene¬ 
ity”  which  is  invariant  to  pointwise  (monotone)  data  transformations  induced  by 
lighting  and  other  factors. 

Disparity  Measures.  Sometimes,  just  grey  level  histograms  are  enough  for 
good  partitionings,  as  with  the  SAR  image  of  water  and  ice  (see  §5).  In  these 
cases,  disparity  is  measured  as  follows.  Recall  that  Ds,  s  G  S^\  is  a  square  block 
of  pixel  sites  (always  21  x  21  in  the  partitioning  experiments)  centered  at  s.  Let 
y(D3)  =  {yr  :  r  G  D,}.  Given  (possibly  distant)  neighbors  s,t  G  \  we  define 
using  the  Kolmogorov-Smirnov  statistic  and  a  threshold  c: 

$s,t  =  2S{d(y(D.)MD,))>c)(y)  -  1 

In  other  words,  is  1  or  -1  depending  on  whether  the  Kolmogorov-Smirnov 
statistic  is  above  threshold  or  not. 

Of  course,  many  distinct  textures  have  nearly  identical  histograms  (see  [26]  for 
some  experiments  with  partitioning  and  classification  of  such  textures,  also  in  the 
Bayesian  framework).  In  these  cases,  discrimination  will  rely  on  features,  or  trans¬ 
formations,  that  go  beyond  raw  grey  levels,  involving  various  spatial  statistics.  We 
use  several  of  these  at  once,  defining  to  be  1  if  the  Kolmogorov-Smirnov  statis¬ 
tic  associated  with  any  of  these  transformations  exceeds  a  transformation-specific 
threshold,  and  -1  otherwise.  The  philosophy  is  simple:  If  enough  transformations 
are  employed,  then  two  distinct  textures  will  differ  significantly  in  at  least  one  of 
the  aspects  represented  by  the  transformations.  Unfortunately,  the  implementation 
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of  this  idea  is  complicated;  more  transformations  mean  more  thresholds  to  adjust, 
and  more  possibilities  for  “false  alarms”  based  on  “normal”  variations  within  ho¬ 
mogeneous  regions. 

Proceeding  more  formally,  let  A  denote  one  such  data  transformation  and 
put  y'  =  A(y),  the  transformed  image.  In  general,  y't  is  a  function  of  both  yt  and 
the  grey-levels  in  a  window  centered  at  t  G  Sj.  For  example,  y[  might  be  the 
mean,  range,  or  variance  of  y  in  a  neighborhood  of  t,  or  a  measure  of  the  local 
“energy”  or  “entropy”.  Or,  y't  might  be  a  directional  residual  defined  in  (1.1); 
isotropic  residuals,  in  which  the  pixels  {fy}  surround  t,  are  also  effective.  Notice 
that  any  A  given  by  (1.1)  is  linear  in  the  .sense  that  if  ys  — ►  aya  +  6,  Vs  G  S/  then 
y't  laly*>  G  5/,  and  recall  that  the  Kolmogorov-Smirnov  statistic  is  invariant 
with  respect  to  such  changes.  This  invariance  is  shared  by  other  features,  such  as 
the  mean,  variance,  and  range.  It  should  also  be  noted  that  these  transforms  are 
decidedly  multivariate,  depending  (statistically)  on  the  marginal  distributions  of  the 
data  of  at  least  dimension  three.  Many  approaches  to  texture  analysis  are  based 
solely  on  the  one-  or  two-dimensional  marginals,  i.e.  the  grey-level  histogram  and 
co-occurrence  matrices.  We  were  not  able  to  reliably  detect  some  of  the  boundaries 
between  the  Brodatz  microtextures  with  these  standard  features.  Perhaps  the  jury 
is  still  out. 

Given  a  family  of  transformations,  Ai,  A2, . . . ,  Am,  we  define 
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(2.3) 


t  —  max 

1  <  i  <  m 


2<5{d(y <i>(D.),y(i)(D,))>ci}(y(,))  ~  1 


where  yW  =  A,(y),  1  <  i  <  m,  and  y(,)(Dr)  =  {yi'\s  G  Dr},  r  =  s,t.  The 
thresholds  Ci,...,cm  are  chosen  to  limit  the  percentage  of  “false  alarms”  (cases  of 
exceeding  threshold  for  pairs  of  blocks  within  the  same  texture);  see  §5. 

The  disparity  measure  (2.3)  inherits  the  aforementioned  invariance  to  linear 
shifts  for  many  transforms,  including  all  “differences  of  averages”.  More  impor¬ 
tantly,  perhaps,  imagine  we  are  comparing  two  pairs  of  image  blocks,  each  pair 
in  a  different  region  of  the  image.  Then,  roughly  speaking,  the  two  distances  are 
automatically  calibrated,  regardless  of  the  differing  statistical  properties  of  the  two 
regions;  i.e.  the  disparity  measure  has  the  same  interpretation  anywhere  in  t]i£ 
image. 

Penalties.  Recall  that  V(x)  counts  the  total  number  of  “penalties”  associated 
with  x  G  There  axe  two  kinds  of  “forbidden”  configurations  that  give  rise  to 

penalties:  roughly,  these  correspond  to  very  small  regions  and  very  narrow  regions. 
Fix  <7  and  s  G  .  Let  Ea,  be  the  5x5  block  of  sites  in  5^  *  centered  at  s.  A 

configuration  x  is  “small  at  s  G  S^”  if  fewer  than  nine  labels  in  {xf  :  t  G  E,} 
agree  with  xa.  Notice  that  a  right  corner  at  s  is  allowed;  there  are  exactly  nine 
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agreements.  The  total  number  of  penalties  for  “small  regions”  is 


i 

9 


(2.4) 


E  5«e,sb.  >)<9) 

ses^ 


Obviously,  the  numbers  “5”  and  “9”,  as  well  as  other  penalty  parameters  below, 
are  quite  arbitrary,  and  cduld  reasonably  be  scale- dependent. 

As  for  “thin  regions”,  these  are  regions  that  have  a  horizontal  or  vertical  “neck” 
that  is  only  one  label-site  wide  (at  resolution  a).  Let  r/,  be  a  one-site  horizontal 
translation  within  S^\  and  let  r„  be  the  analogous  vertical  translation.  Penalties 
arise  when  either  {x,_rk  ^  x,  and  xs  ^  i1+rk}  or  {xa_r„  ^  x,  and  x3  ^  x,+ru}. 
The  number  of  “thin-region”  penalties  is  therefore 


and  V'(x)  is  just  the  sum  of  (2.4)  and  (2.5). 

Summary.  We  are  given 

(i)  a  grey-level  image  y  =  {ytJ}; 

(ii)  a  resolution  a  =  i,  2, . . .,  and  a  maximum  number  of  labels,  P; 
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(iii)  a  disparity  measure  $J>t( y)  for  each  pair  <  s,t  >„  in  the  sub-lattice  5’£r); 

(iv)  a  collection  of  penalty  patterns. 

The  (MAP)  partitioning  x  =  x(y)  is  then  any  solution  x  G  ^'P)  of  the  constrained 
optimization 

minimize  .  y(  z  j =o  E  ^{r,  =  r,}^j,l(y) 

<s,t> „ 

where  V(x)  is  the  number  of  penalties  in  x. 


3.  BOUNDARY  MODEL 
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Boundary  Maps.  The  pixel  lattice  is  again  Si.  Let  Sb  denote  another 
regular  lattice  interspersed  among  the  pixels  (see  Figure  3)  and  of  dimension  (N  — 
1)  x  (N  —  1);  these  are  the  “boundary  sites”.  We  will  associate  s  =  (i,j)  6  Sb  with 
the  pixel  (t,j)  €  Si  to  the  upper  left  of  s. 


+  + 


+  + 


N  =  3 

Pixel  sites  (o)  and  boundary  sites  (+) 
FIGURE  3 


Given  y,  a  grey-level  image,  we  wish  to  assign  values  to  the  boundary  variables 
x  =  {x4,s  €  S#},  where  xa  =  1  (resp.  0)  indicates  the  presence  (resp.  absence) 
of  a  boundary  at  site  s  6  Sb-  We  have  already  discussed  the  corresponding  in¬ 
terpretation  of  the  boundary  map  x  =  x(y)  in  terms  of  physical  discontinuities  in 
the  underlying  three-dimensional  scene.  We  establish  a  boundary  resolution  or  grid 
size  <7  >  1,  analogous  to  the  resolution  used  earlier  for  the  partition  model.  Let 
S ^  C  Sb  denote  the  sub-lattice  {(t<r  + 1,  j<r  +  1)  :  1  <  i,j  <  ( N  —  2)/a}.  Only  the 
variables  x4,  s  €  ,  interact  directly  with  the  data;  the  remaining  variables  x4, 

s<ESg\  S ,  are  determined  by  those  on  the  “grid”  S ^ .  Figure  4  shows  the  grids 
sjg2)  and  S^  for  N=S;  the  sites  off  the  grid  are  denoted  by  dots.  The  selection  of 
<7  influences  the  interpretation  of  x,  the  computational  load,  the  interaction  range 
at  the  pixel  level,  and  is  related  to  the  role  played  by  the  size  of  the  spatial  filter 
in  edge  detection  methods  based  on  differential  operators.  Finally,  let  Qj  and  f 
denote  the  state  spaces  of  intensity  arrays  and  boundary  maps  respectively;  that  is, 


ft/  =  {{y*l  :  5  €  S/,0  <  y4  <  255},  =  {{x,}  :  s  €  S^\x,  6  {0, 1}} 


Sometimes,  we  simply  write  SIb  for 

Boundary-Data  Interaction.  Let  <  s,t  >c,  $,t  £  5lff)  denote  a  nearest- 


neighbor  pair  relative  to  the  grid.  Thus,  s  =  ( icr  -f  l,j<7  +  1),  t  —  ( ka  +  1,/cr  +  1) 
is  such  a  (horizontal  or  vertical)  pair  if  either  i  =  k  and  j  =  /  ±  1,  or  j  =  /  and 
i  =  k  ±  1.  We  identify  <  s,t  >a  with  the  elementary  boundary  segment  consisting 
of  the  horizontal  or  vertical  string  of  a  -(- 1  sites  (in  Sb)  including  s,  t  and  the  <7  —  1 
sites  “in  between”. 
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Boundary  grid  at  resolutions  <7  =  2  and  <7  =  3 
FIGURE  4 

The  disparity  measure  should  gauge  the  intensity  “flux”,  A,%t  =  A^(y)  >  0, 
across  <  s,t  >a,  i.e.  orthogonal  to  the  associated  segment.  We  will  experiment 
with  several  types  of  measures.  An  obvious  choice  at  high  resolution  (a  =  1)  is 
As,t  —  \y»*  —  1  where  s*,£*  6  5/  are  the  two  pixels  associated  with  <  s,t  >i;  see 

Figure  5. 
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Pixel  pairs  (o’s)  associated  with  horizontal 
and  vertical  boundary  segments 

FIGURE  5 

The  analogous  choice  at  a  lower  resolution  (a  >  1).  might  be  a  measure  of  the  form 
A,,<  =  m-1|  .  yt  ~  YId,.  where  D3-,Dt-  C  Si  are  adjacent  blocks  of  pixels, 

of  the  same  size  (m)  and  shape,  and  “separated”  by  <  s,  t  >a.  These  and  other 
measures  are  discussed  later  on. 

The  energy  function  U(x,  y)  should  promote  boundary  maps  i  which  are  faith¬ 
ful  to  the  data  y  in  the  sense  that  “large”  values  of  A3f;(y)  are  associated  with  “on” 
segments  ( x,xt  =  1)  and  “small”  values  with  “off’  segments  ( x,xt  =  0).  There  are 
no  a  priori  constraints  on  x  at  this  point;  in  fact,  because  of  digitization  effects, 
textures,  and  so-on,  the  energy  U  will  typically  favor  maps  x  with  undesirable  dead¬ 
ends,  multiple  representations,  high  curvature,  etc.  These  will  be  penalized  later 
on.  A  simple  choice  for  the  x/y  interaction  is 

(3.1)  U(x,  y)  =  ^  (1  “  x»xt)<f>  (a-1As,,(j/)) 

<»,<>* 

where  the  summation  extends  over  all  nearest-neighbor  pairs  <  s,  t  >a ;  the  “weight¬ 
ing  function”  4>{x),  x  >  0,  and  “normalizing  constant”  a  will  be  described  presently. 

The  energy  in  (3.1),  which  is  similar  to  a  “spin-glass”  in  statistical  mechanics, 
is  a  variation  of  the  ones  we  used  in  our  previous  work  ([24], [25]);  when  <7  =  1,  the 
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variable  xsxt  corresponds  directly  to  the  “edge”  or  “line”  variables  in  [25]  and  [57]. 
Since  y  is  given,  the  term  1  —  x4x*  can  be  replaced  by  —x,xt  with  no  change  in 
the  resulting  boundary  interpretation.  By  contrast,  in  [25]  we  were  concerned  with 
image  restoration  and  regarded  both  x  and  y  as  unobservable;  the  data  then  consists 
of  some  transformation  of  y,  involving  for  example  blur  and  noise.  In  that  case,  or 
in  conceiving  U  as  defining  a  prior  distribution  over  both  y  and  x,  the  bond  between 
the  associated  pixels  should  be  broken  when  the  edge  is  active,  i.e.  1  —  x4xt  =  0.  The 
term  1  —  x3xt  is  exactly  analogous  to  the  “controlled-continuity  functions”  in  [68]. 
See  also  [43]  for  experiments  involving  simulations  from  a  related  Markov  random 
field  model;  the  resulting  “fantasies”  (y,x)  are  generated  by  stochastic  relaxation 
and  yield  insight  into  the  nature  of  these  layered  Markov  models. 


Returning  to  (3.1),  a  little  reflection  shows  that  <f>  should  be  increasing,  with 
tf(0)  <  0  <  <f>(+oo);  otherwise,  if  <f>  were  never  negative,  the  energy  would  always 
be  minimized  with  xs  =  1.  The  intercept  0  ==  1  (0)  is  critical;  values  of  A 

above  (resp.  below)  the  threshold  d *  =  a0  will  promote  (resp.  inhibit)  boundary 
formation.  The  influence  of  the  threshold  is  reduced  by  choosing  <p'(0)  =  0.  We 
employ  the  simple  quadratic 


\  -  (v^)  .  O<x<0 


Notice  that  the  maximum  “penalty”  (<ji(0)  =  —1)  and  “reward”  (<£(1) 
balanced  if  we  select  a  ss  maxA,i(. 


=  1)  are 


Disparity  Measures.  We  employ  one  type  of  measure  for  depth  and  shape 
boundaries  and  another  for  the  texture  experiments.  In  the  former  case,  the  dispar¬ 
ity  measure  involves  the  (raw)  grey-levels  only,  whereas  for  texture  discrimination 
ve  also  consider  data  transforms  based  on  the  directional  residuals  (1.1).  Except 
when  <r  =  l,  the  data  sets  are  compared  by  the  Kolmogorov-Smirnov  distance. 


At  the  highest  resolution  (cr  =  1),  the  measure  |y4.  —  y--  j  (where  sm,  t*  axe  the 
two  pixels  associated  with  the  boundary  sites  s,t  -  see  Figure  5)  can  be  effective  for 
simple  scenes  but  necessitates  a  single  differential  threshold  d*  =  a/?.  Differences 
above  d*  (resp.  below  d*)  promote  (resp.  inhibit)  boundary  formation.  Typically, 
however,  this  measure  will  fluctuate  considerably  over  the  image,  complicating  the 
selection  of  d*.  (Such  is  the  case,  e.g.  for  the  “cart”  scene,  see  §5.)  Moreover, 
this  measure  lacks  any  invariance  properties,  as  will  be  explained  below.  A  more 
effective  measure  is  one  of  the  form 


(3-3)  A.,(y)  = 

7+  2jy*;  -y<;l 

where  the  sum  extends  over  parallel  edges  <  >  in  the  immediate  vicinity  of 

<  s,t  >.  Thus  the  difference  |y4-  —  yt-  j  is  “modulated”  by  adjacent,  competing  dif¬ 
ferences.  The  result  is  a  spatially-varying  threshold  and  the  distribution  of  A,i((y) 


iw 


m 


w: 


m 


-fcfe  -•-»«(  «*»W  i 


o  o  o  o  o 


0  0  0  0  0 


o  0  O  0  0 


0  0  0  0  0 


D,»  o  o  o  o  o 


O  O  0  0  Df 


o  o  o  o  o 


0  0  0  0  0 


o  o  o  o  o 


0  0  0  0  0 


Pixel  blocks,  Da»  and  Dt •  , 
associated  with  boundary  segment  <  s,t  >3 

FIGURE  6 


across  the  image  is  less  variable  than  that  of  |y5-  —  yt- 1.  Choosing  7  =  const,  x  A, 
where  A  is  the  mean  (raw)  absolute  intensity  difference  over  all  (vertical  and  hori¬ 
zontal)  bonds,  renders  ASi*(y)  invariant  to  linear  transformations  of  the  data;  that 
is,  A ,,<(y)  =  A ,,t(ay  +  b)  for  any  a,  b. 


At  lower  resolution,  let  D3-  and  Df  denote  two  adjacent  blocks  of  pixels,  of 
equal  size  and  shape.  An  example  is  illustrated  in  Figure  6  for  the  case  of  two 
square  blocks  of  size  52  =  25  pixels  which  straddle  a  vertical  boundary  segment 
with  <7  =  3.  Let  y{Dr )  =  {yj,s  G  Dr},  r  =  s*,  t“,  be  the  corresponding  grey-levels 
and  set 


a  sM  =  d(y(D*’),y(Dr)), 


where  d  is  the  Kolmogorov- Smirnov  distance  discussed  in  §2.  This  is  the  disparity 
measure  used  for  the  House  and  Ice  Floe  scenes  (see  §5). 


One  difficulty  with  (3.4)  is  that  the  distance  between  two  non-overlapping 
histograms  is  the  maximum  value,  namely  1,  regardless  of  the  amount  of  separation. 
Thus,  two  constant  regions  differing  by  a  single  grey  level  are  as  “far  apart”  as  two 
differing  by  255  levels.  Thus,  it  is  occassionally  necessary  to  “de-sensitize”  (3.4), 
for  example  by  “smearing”  the  data  or  perhaps  adding  some  noise  to  it;  see  §5. 


Raw  grey-level  data  is  generally  not  satisfactory  for  discriminating  textures. 
Instead,  as  discussed  in  §2,  we  base  the  disparity  measure  on  several  data  transfor¬ 
mations,  involving  higher  order  spatial  statistics,  such  as  the  directional  residuals 
defined  in  (1.1).  Given  a  family  Ai,A2,-..,Am  of  these  transforms  (see  §2),  a 
resolution  <7,  and  blocks  D ,  Df  as  above,  define 


Aj,/(y)  =  max  c, ld(y(,)(D3.),y(,}(Df) 

l<t  <m 
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where  =  A ,(y),  1  <  i  <  m,  arid  y^(Dr)  =  {yi'\s  €  Dr},  exactly  as  in  §2.  Then 

A,it(y)  >  d*  (and,  hence,  <£(a-1  AJit(y))  >  0)  if  and  only  if  d(y(,)(D,. ),  y(,)(D,.))  > 
d“c,-  for  some  transform  t.  The  thresholds  c\, . . . ,  cm  axe  again  chosen  to  limit  “false 
alarms”.  Finally,  we  note  that  (3.5)  has  the  same  desirable  invariance  properties  as 
the  measure  constructed  for  partitioning  (§2). 

Penalties.  V(x)  again  denotes  the  total  number  of  “penalties”  associated 
with  x  €  fl^gK  These  penalties  are  simply  local  binary  patterns  over  subsets  of 
S^g\  Figure  7  illustrates  a  family  of  four  such  patterns;  they  can  be  associated 
with  any  resolution  a  by  the  obvious  scaling. 
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Forbidden  patterns 
FIGURE  7 

These  correspond,  respectively,  to  an  isolated  or  abandoned  segment,  sharp  turn, 
quadruple  junction,  and  “small”  structure.  Depending  on  cr,  the  pixel  resolution, 
and  scene  information,  we  may  or  may  not  wich  to  include  the  latter  three.  For 
example,  including  the  last  one  with  cr  =  6  would  prohibit  detection  of  a  square 
structure  of  pixel  size  6x6. 

To  further  clarify  the  definition  of  V,  identify  each  pattern,  up  to  translation, 
with  a  set  C  C  5^  and  binary  set  £  =  {£,,s  €  C}.  For  instance,  for  cr  =  2, 
the  first  pattern  in  Figure  7  is  represented  by  C  =  {(1, 3),  (3, 1),  (3,  3),  (3,  5)}  and 
corresponding  ^-values  0,0, 1,0.  Let  {C^,£^},  1  <  j  <  J  be  a  family  of  patterns. 
Then 


;  =  1  r 

where  the  inner  sum  extends  over  all  translates  r  and 


hr  =  {1  ' 
lo  c 


ifxJ+r  =  &3)  for  all  seC^ 
otherwise 


Finally,  there  is  a  natural  extension  from  Qg  to  Db  which  is  useful  for  display 
and  evaluation.  Given  x  G  we  define  x,  =  0  for  sites  s  G  5b  \  lying  on  a 
row  or  column  disjoint  from  S(g  \  and  x,  =  x(lx(,  if  s  lies  on  a  segment  <  <i,<2  >a- 
Thus,  for  example,  the  state  x  €  'n  Figure  S(a)  is  identified  with  the  state 
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x  €  CIb  in  8(b). 
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Completion  of  boundary  configuration,  from  a  —  2  to  a  —  1 

Figure  8 

Summary.  We  axe  given 

(i)  a  grey-level  image  y  =  {y,y}; 

(ii)  a  resolution  level  a  —  1,2,...; 

(iii)  a  disparity  measure  <f>(a~1A ,,t(y))  for  each  neighbor  pair  <  s,t  >a  in  the 
sub-lattice  S3'; 

(iv)  a  collection  of  penalty  patterns. 

The  (MAP)  boundary  estimate  i  =  x(y)  is  any  solution  x  6  )  of  the  constrained 

optimization 

minimizer;v(r)=o  53  (1  -  x»Xt)<k  (<*~3AJif(y)) 

<«,*> „ 

where  V(x)  is  the  number  of  penalties  in  x. 
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4.  ALGORITHMS 

We  begin  with  an  abstract  formulation  of  the  optimization  and  sampling  prob¬ 
lems  outlined  in  Sections  1-3.  We  are  actually  interested  in  •posterior  distributions, 
II(x|y),  but  y  is  fixed  by  observation,  and  can  be  ignored  in  this  discussion  of 
computational  issues.  Thus,  we  axe  given  two  functions  U  and  V  on  a  space  of 
configurations  ft  =  {(xa,  ,xSj , . . .  ,x3M )  :  x4>.  €  A,  1  <  i  <  M}  where  A  is  finite, 
M  is  very  large,  and  S  =  {si.sj, . . . ,  sm }  is  a  collection  of  “sites”,  typically  a  2-D 
lattice.  Write  x  —  ( x3l , . . . ,  sJW)  for  an  element  of  ft,  and  let 

fi*  =  {x  :  V(x)  =  u},  y  =  minV(x) 


n*(x)  =  sa.(x) 


exp{-U(x)} 

Er-en*  exp{-J7(x')} 


We  wish  to  solve  the  constrained  optimization  problem  minimize  {U(x)  : 
V(x )  =  u}  or  to  sample  from  the  Gibbs  distribution  II*.  (Recall  that  sampling 
allows  us  to  entertain  estimates  other  than  the  MAP.) 

We  have  studied  [25]  Monte  Carlo  site- replacement  algorithms  for  the  uncon¬ 
strained  versions  of  these  problems:  stochastic  relaxation  (SR)  for  sampling,  and 
stochastic  relaxation  with  simulated  annealing  (SA)  for  optimization.  SA  was  de¬ 
vised  in  [12]  and  [47]  for  minimizing  a  “cost  functional”  U  (e.g.  the  tour  length  for 
the  tra.eling  salesman  problem)  by  regarding  U  as  the  energy  of  a  physical  system 
and  simulating  the  dynamics  of  chemical  annealing.  The  effect  is  to  drive  the  sys¬ 
tem  towards  the  “ground  states”,  i.e.  the  minimizers  of  U.  This  is  accomplished  by 
applying  the  Metropolis  (relaxation)  algorithm  to  the  Boltzmann  distribution 

e-0(r)/t 

(4,1)  j2zie-u(Z')/t 

at  successively  lower  values  of  the  “temperature”  t. 

We  presented  two  theorems  in  [25]:  one  for  generating  a  sequence  {.Y(A:)} 
which  converges  in  distribution  to  (4.1)  for  t  =  1  (SR),  and  one  for  generating  a 
sequence  {A’’(fc)}  having  asymptotic  distribution  the  uniform  measure  over  ft0  = 
{x  6  Q  :  U(x)  =  u},  u  =  minrI/(x),  (SR  with  SA).  The  essence  of  the  latter 
algorithm  is  a  “cooling  schedule”  t  =  < i ,  <2 » -•  -  for  guaranteeing  convergence.  SA  has 
been  extensively  studied  recently  ([5], [14], [28], [29], [35], [39], [41], [67]);  see  also  the 
comprehensive  review  [1]  and  the  references  therein.  Applications  have  emerged  in 
neural  networks  and  circuit  design,  to  name  but  two  areas. 

Results  concerning  constrained  SR  and  SA  are  reported  in  [23],  which  was 
motivated  by  a  desire  to  find  a  theoretical  foundation  for  the  algorithms  used  here. 


mmmEsmsm 


We  have  deviated  from  the  instructions  in  [23],  with  regard  to  the  cooling  schedule, 
but  at  least  we  know  that  the  algorithms  represent  approximations  to  rigorous 
results. 

Both  algorithms  produce  a  Markov  chain  on  ft  by  sampling  from  the  low-order, 
marginal  conditional  distributions  of  the  free  Gibbs  measures 


II(x;  t,  A)  = 


exp  {-t-1  ( U(x )  +  AV(x))} 
Ei'  exP  {~*-1  (u(x')  +  *v(x')} 


It  is  easy  to  check  that 


lim  II(x;  1,  A)  =  IT(x) 

A— oo 


and  that 


lim  n(x;t,A)  =  {ln°l 

XiZT  *•  °> 


x  g  ft; 

otherwise 


where  ft*  =  {u;  G  ft*  :  U{u)  =  £},  £  =  minzen-  U(x).  Let  II0  denote  the  uniform 
measure  in  (4.3).  Sampling  directly  from  II(x;  t ,  A)  is  impossible  due  to  the  size  of 
ft;  otherwise  just  use  (4.2)  and  (4.3)  to  generate  a  sequence  of  random  variables 
X(k),  k  =  1,2,...,  with  values  in  ft,  and  limiting  distribution  either  II*  or  II0  . 
However,  we  can  evaluate  ratios  n(x;  t ,  A)/iI(x;  t ,  A),  x,  z  G  ft,  and  hence  conditional 
probabilities.  The  price  for  indirect  sampling  is  that  we  must  restrict  the  rate  of 
growth  of  A  and  the  rate  of  decrease  of  t. 

Fix  two  sequences  {<*},  {At},  a  “site  visitation”  schedule  {A*.},  At  C  S, 
and  let  II*(x)  =  II(x;<*,Afc).  The  set  A*  is  the  cluster  of  sites  to  be  updated 
at  “time”  k;  the  “centers”  of  the  clusters  are  addressed  in  a  raster  scan.  In  our 
experiments  we  take  either  |Afc|  =  1  or  |A*|  =  5,  in  which  case  the  At’s  are  of  the 
form  {(*,;),(*  +  l,j),(i  -  l,j),(*,  j  +  1  ),(*,;  -  !)}• 

Define  a  non-homogeneous  Markov  chain  {X(k),k  =  0,1,2,...}  on  ft  as 
follows.  Put  X(0)  =  T]  arbitrarily.  Given  X(k)  =  (X3l{k), . . .  ,X3M(k)),  define 
X3(k  +  1)  =  X3(k )  for  s  ^  Ajt+i  and  let  {X3(k  +  1)  :  s  G  At+i}  be  a  (multivari¬ 
ate)  sample  from  the  conditional  probability  distribution  IIfc+i(xj,  s  G  Ajt+i|xj  = 
X3(k ),  s  £  Ajt+i).  Then,  under  suitable  conditions  on  {fj.}  and  {A;.},  either 

lim  P(X(k)  =  x ] JVT(O)  =  77)  =  n*(x) 

k— oo 

or  the  limit  is  II0(x).  The  condition  in  the  former  case  (constrained  SR)  is  that 
=  1,  A*.,  Z  00,  and  A;.  <  const.  •  log k.  The  condition  for  convergence  to  II0 
(constrained  SA)  is  that  tk  \  0,  A k  Z  cc  and  t^1  A*  <  const.  •  logfc.  The  algorithm 
yields  a  solution  to  the  constrained  optimization  problem  (1.2)  in  the  sense  that  the 


asymptotic  distribution  of  X(k)  is  uniform  over  the  solution  set:  if  the  solution  is 
unique,  i.e.  Cl*  =  {x„},  then  X(k)  —*  x0  in  probability.  See  [23]  for  proofs. 

Approximations.  The  logarithmic  rate  is  certainly  slow.  Still,  we  often 
adhere  to  it  for  ordinary  annealing;  others  ([44], [57])  have  as  well.  We  refer  the 
reader  to  [43]  for  some  interesting  comparisons  between  schedules  and  to  [67]  for 
“fast  annealing”  algorithms.  It  is  commonplace  to  find  linear  (t*  =  t0  —  ak )  and 
exponential  (f*  =  (1  —  y)kt0,  j  small)  schedules;  here  k  refers  to  the  number  of 
sweeps  or  iterations  of  5;  in  our  experiments  S  =  or 

We  now  describe  several  protocols  used  in  our  experiments.  One  variant  we 
do  not  use  is  to  fix  A  *  =  A  very  large  and  do  ordinary  annealing,  which  might 
appear  sensible  since  the  solutions  to  min{t/(x)  :  V(x)  =  0}  coincide  with  those 
of  min{C/(i)  +  AV(x)}  for  all  A  sufficiently  large  (due  to  the  fact  that  Cl  is  finite). 
However  this  is  not  practical:  unless  t0  is  very  large  and  f*  is  reduced  very  slowly, 
the  system  immediately  gets  stuck  in  local  energy  minima  ol  U  +  XV  which  are 
basically  independent  of  the  data,  although  faithful  to  the  constraints.  It  is  better 
to  begin  with  states  faithful  to  the  data  and  slowly  impose  the  constraints ,  a  standard 
technique  in  conventional  optimization. 

One  variation  of  constrained  SR  that  has  been  effective  is  “low- temperature 
sampling”:  fix  t*  =  e  (small)  and  let  A*  f  oo.  The  idea  is  to  reach  a  likely  state 
of  the  posterior  distribution  II(x|y).  In  practice,  we  allow  A*  to  grow  linearly,  the 
details  are  in  Section  5. 

Another  variation  is  the  analogue  for  constrained  relaxation  of  “zero-  temper¬ 
ature”  sampling,  which  has  been  extensively  studied  by  Besag  [2]  under  the  name 
ICM  (for  “iterated  conditional  modes”);  see  also  [15],  [IS]  and  [27].  Without  con¬ 
straints,  this  algorithm,  which  is  deterministic,  results  in  a  sequence  of  states  X(k) 
which  monotonically  decrease  the  global  energy,  i.e.  increase  the  posterior  likeli¬ 
hood.  The  constrained  version  operates  as  follows.  Recall  that  when  the  set  of 
sites  Afc+i  is  visited  for  updating,  we  defined  X(k  +  1)  by  replacing  the  coordinates 
of  X(k)  in  Ajt+i  by  a  sample  drawn  from  the  conditional  distribution  of  II*+i  on 
{i,,  s  G  A*+1}  given  the  values  {x,  =  X,(k ),  s  0  A*+i}.  Suppose  we  replace  the 
sample  with  the  mode,  i.e.  the  most  likely  vector  {xa,  s  €  A*+j}  conditional  upon 
{x,  =  Xa(k),  s  $  Ajt+j}.  In  essence,  we  fix  t *  =  0.  This  generates  a  deterministic 
sequence  X(k),  k  =  0, 1, 2, . . .  depending  only  on  -AT(0),  11*,  and  {A*}.  (Notice  that 
the  mode  is  unaffected  by  t*  since  it  corresponds  to  the  minimum  of  i/(x)-f  A*V(x).) 
Then,  during  the  ktf>  sweep,  with  A  =  A*,  the  energy  U  +  A*V  is  successively  re¬ 
duced,  just  as  in  ICM  where  A*  =  0.  Of  course  since  there  is  no  fixed  (reference) 
energy,  the  algorithm  cannot  be  conceived  as  one  of  iterative  improvement.  Several 
experiments  were  run  with  both  the  stochastic  and  deterministic  algorithms;  see 
Section  5. 


5.  EXPERIMENTS  1 


PARTITION  MODEL 

There  are  three  experiments:  an  L-band  synthetic  aperature  radar  (SAR) 
image  2  of  ice  floes  in  the  ocean  (Figure  9),  a  texture  mosaic  constructed  from  the 
Brodatz  album  [8]  (Figure  10),  and  another  mosaic  from  pieces  of  rug,-  plastic  and 
cloth  (Figure  11). 

Processing.  In  each  experiment  the  partitioning  was  randomly  initiated;  the 
labels,  x,  s  G  S^\  were  chosen  independently  and  uniformly  from  0, 1,...P  —  1. 
Thereafter,  label  sites  were  visited  and  updated  one  at  a  time,  by  a  “raster  scan” 
sweep  through  the  label  array.  MAP  partitionings  were  approximated  by  “zero- 
temperature”  sampling  (see  §4),  with  A  =  A*  increasing  with  the  number  of  sweeps. 
Specifically,  A  was  held  at  0  through  the  first  10  sweeps,  and  thereafter  was  raised  by 
1  every  5  sweeps:  A*  =  0,  k  =  1,  ...10;  A*  =  1,  k  —  11,  ...15;  Afc  =  2,  k  =  16,  ...20; 
etc.  Most  probably,  A  could  have  been  increased  more  rapidly,  perhaps  with  every 
sweep,  without  substantially  changing  the  results,  but  this  was  not  systematically 
investigated.  For  the  three  experiments  shown  in  Figures  9,  10,  and  11,  between 
15  and  50  sweeps  sufficed  to  bring  the  changes  in  labels  to  a  halt;  see  below  for 
more  details.  Recall  that  zero- temperature  sampling  corresponds  to  choosing  the 
conditional  mode.  Occasionally  there  axe  ties,  and  these  were  resolved  by  choosing 
randomly,  and  uniformly,  from  the  collection  of  modes. 

As  a  general  rule,  results  were  less  reliable  at  higher  resolutions  (lower  cr’s) 
and  when  more  labels  were  allowed  (higher  values  of  P).  In  these  cases,  repeated 
experiments,  with  different  initializations,  often  produced  different  results.  With  P 
too  large,  homogeneous  regions  were  frequently  subdivided,  being  assigned  two  or 
three  labels.  With  a  too  small,  the  tendency  was  to  mislabel  small  patches  within  a 
given  texture.  It  is  likely  that  many  of  these  mistakes  correspond  to  local  minima; 
perhaps  some  could  be  corrected  by  following  a  proper  annealing  schedule  (see 
§4),  and  by  more  careful  choices  of  thresholds  (see  below).  Here  again,  definitive 
experiments  have  not  been  done. 

Measures  of  Disparity.  Recall  that  the  disparity  measure  is  derived  from 
the  Kolmogorov-Smirnov  distance  between  blocks  of  pixel  data  under  various  trans¬ 
formations,  as  defined  in  §2,  equation  (2.3).  For  the  SAR  image,  good  partitionings 
were  obtained  using  only  the  raw  data:  m  =  1  and  y (1)  is  just  y  in  equation  (2.3). 

1  Fortran  code  and  terminal  sessions  are  available. 

2  We  are  grateful  to  the  Radar  Division  at  ERIM  for  providing  us  with  the  SAR 
image  (collected  for  the  U.S.  Geological  Survey  under  Contract  14-08-0001-21748 
and  the  Office  of  Naval  Research  under  Contract  N-00014-81-C-0692  and  N-00014- 
81-C-0295). 
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Evidently,  grey-level  distributions  are  enough  to  segment  the  water  and  ice  “tex¬ 
tures”,  at  least  when  supplemented  by  the  “prior  constraints”  embodied  in  the 
penalty  term,  V(x). 

The  texture  collages  in  Figures  10  and  11  are  harder.  We  used  four  data 
transformations  in  addition  to  the  raw  pixel  data.  Hence,  for  these  experiments 
m  =  5,  y W  =  y,  and  y^2\  ... y ^  are  based  on  various  transforms.  In  particular. 

yi 2*  measures  the  intensity  range  in  the  7x7  pixel  block,  V3,  centered  at  s: 

y(s2)  =  maxtev.yt  -  mmf6v,yt; 


f 


yi  '  is  the  “residual”  (equation  (1.1))  obtained  by  comparing  y9  to  the  24 
“boundary  pixels”  ( dVs )  of  Vs  (i.e.  all  pixels  on  the  perimeter  of  the  7x7 
block): 


y<3) 


yd 
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and  y^  and  y^  are  horizontal  and  vertical  “directional  residuals”: 

y[4)  =  \vs  —  ^(y-Ki.o)  +  y4+(-i,o))l 
'A5)  =  \Vs  -  ^(y^-Ko.i)  +  yi+(0>-i))|- 

Parameter  Selection.  The  resolution  (o)  was  7  for  the  SAR  picture  (Figure 
9);  15  for  the  Brodatz  collage;  and  13  for  the  pieces  of  rug,  plastic  and  cloth. 
These  numbers  were  chosen  more  or  less  ad  hoc,  but  are  small  enough  to  capture 
the  important  detail  of  the  respective  pictures  while  not  so  small  as  to  incur  the 
degraded  performance,  seen  at  higher  resolutions  and  mentioned  earlier. 

The  number  of  allowed  labels  is  also  important;  recall  that  too  many  usually 
results  in  over-segmentation.  This  was  actually  used  to  advantage  in  the  SAR 
experiment  (Figure  9),  where  there  are  evidently  two  varieties  of  ice.  The  best 
segmentations  were  obtained  by  allowing  three  labels.  Invariably,  two  would  be 
assigned  to  the  ice,  and  one  to  the  water.  Using  just  two  labels  led  to  mistakes  within 
the  ice  regions,  although  there  was  little  experimentation  with  the  Kolmogorov- 
Smirnov  threshold,  and  no  attempt  was  made  with  the  data  transforms  (m  >  1) 
used  for  the  collages.  In  the  other  experiments,  the  number  of  labels  was  set  to  the 
number  of  texture  species  in  the  scene. 

The  most  important  parameters  were  the  thresholds,  {c,}  1  <  z  <  m,  associated 
with  the  Kolmogorov-Smirnov  statistics  (see  (2.3)).  For  the  SAR  experiment,  m  = 

31 


itiMiiaiiiiii* 


JPWWWWW!  WIN  UMV1VUIUIUI  Ul  UUWJJ  V  M  v  www  j  iw  ww  vewv.-vu  ■«  w  T. 


1,  and  the  threshold  was  guessed,  a  priori;  it  was  found  that  small  changes  are 
reflected  only  in  the  lesser  details  of  the  segmentation.  For  the  collages  (m  =  5),  the 
thresholds  were  chosen  by  examining  histograms  of  Kolmogorov-Smirnov  distances 
for  block  pairs  within  homogeneous  samples  of  the  textures.  Thresholds  were  set 
so  that  no  more  than  three  or  four  percent  of  these  intra-region  distances  would 
be  above  threshold  (a  “false  alarm”).  Of  course,  we  would  have  preferred  to  find 
more  or  less  universal  thresholds,  one  for  each  data  transform,  but  this  may  not 
be  possible.  Conceivably,  with  enough  of  the  “right”  transforms,  one  could  set 
conservative  (high)  and  nearly  universal  thresholds,  and  be  assured  that  visibly 
distinct  textures  would  be  segmented  with  respect  to  at  least  one  of  the  transforms. 
Recall  that  the  disparity  measure  (equation  (2.3))  is  constructed  to  signal  “different” 
when  the  distance  between  blocks,  with  respect  to  any  of  the  transforms,  exceeds 
threshold. 

Figure  9  (SAR).  As  mentioned  earlier,  three  labels  were  used,  with  the  ex¬ 
pectation  that  the  ice  would  segment  into  two  regions  (basically,  dark  and  light). 
The  resolution  was  a  —  7,  and  the  Kolmogorov-Smirnov  statistic  was  computed 
only  on  the  raw  data,  so  m  =  1.  The  threshold  was  Cj  =  .15.  The  original  image  is 
512  x  512  (the  pixel  resolution  is  about  4m  by  4m),  but  to  avoid  special  treatment 
of  the  boundary,  only  the  462  x  462  piece  shown  in  9(A)  was  processed.  The  label 
lattice,  is  64  X  64.  Figure  9(B)  shows  the  evolution  of  the  partitioning  during 
the  relaxation.  For  display,  grey  levels  were  arbitrarily  assigned  to  the  labels.  The 
upper  left  panel  is  the  random  starting  configuration.  In  successive  panels  axe  the 
states  of  the  labels  after  each  five  iterations  (full  sweeps).  In  the  bottom  right  panel, 
the  two  labels  associated  with  ice  are  combined,  “by  hand”. 

Figure  10  (Brodatz  Textures).  The  Kolmogorov-Smirnov  thresholds  were 
ci  =  .40,  C2  =  .53,  C3  =  .26,  =  .28, and  cs  =  .19,  corresponding  to  the  transforms 

discussed  above.  A  246  x  246  piece  of  the  original  256  x  256  image  was 
processed,  and  is  shown  in  Figure  10(A).  Leather  and  water  are  on  top,  grass  and 
wood  on  the  bottom,  and  sand  is  in  the  middle.  The  resolution  was  a  =  15,  which 
resulted  in  a  16  x  16  label  lattice  .  Figure  10(B)  shows  the  random  starting 
configuration  (upper  left  panel),  the  configuration  after  5  iterations  (upper  right 
panel),  after  10  iterations  (lower  left  panel),  and  after  15  iterations  (lower  right 
panel),  by  which  point  the  labels  had  stopped  changing. 

Figure  11  (Rug,  Plastic,  Cloth).  The  216  x  216  image  in  Figure  11(A)  was 
partitioned  at  resolution  a  =  13,  with  a  16  x  16  label  lattice.  The  Kolmogorov- 
Smirnov  thresholds  were  ci  =  .90,  C2  =  .49,  C3  =  .20  c4  =  .11,  and  C5  =  .12, 
corresponding  to  the  same  data  transforms  used  for  the  Brodatz  textures  (Figure 
10).  The  experiment  makes  apparent  a  hazard  of  long  range  bonds:  the  gradual  but 
marked  lighting  variation  across  the  top  of  the  image  produces  a  large  Kolmogorov- 
Smirnov  distance  when  raw  pixel  blocks  from  the  left  and  right  sides  are  compared. 
This  makes  it  necessary  to  essentially  ignore  the  raw  data  Kolmogorov-Smirnov 
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statistic,  and  base  the  partitioning  on  the  four  data  transformations;  hence  the 
threshold  ci  =  .9.  The  transformed  data  are  far  less  sensitive  to  lighting  gradients. 
Figure  11(B)  displays  the  evolution  of  the  partitioning  during  relaxation.  The  lay 
out  is  the  same  one  used  in  the  previous  figures,  showing  every  5  iterations,  except 
that  there  are  10  iterations  between  the  final  two  panels.  The  lower  right  panel  is 
the  partitioning  after  the  30’th  sweep,  by  which  time  the  pattern  was  “frozen”. 


PLACE  FIGURES  9,  10, 11  HERE 


BOUNDARY  MODEL 

There  are  five  test  images:  one  made  indoors  from  tinkertoys  (“cart”),  an  out¬ 
door  scene  of  a  house,  another  of  ice  floes  in  the  ocean  (the  same  SAR  image  used 
above),  and  two  texture  mosaics  constructed  from  the  Brodatz  album. 

Processing.  All  the  experiments  were  performed  with  the  same  site-visitation 
schedule.  Given  the  resolution  <7,  which  varies  among  experiments,  the  sites  of  the 
sub-lattice  5 ^  were  addressed  in  a  raster-scan  and  five  sites  were  simultaneously 
updated.  Specifically,  at  each  visit  to  the  site  (z'<7  +  l,jcr  -f  1),  the  values  of  the 
boundary  process  at  this  site  and  its  four  nearest  neighbors,  {((z  ±  1  )tr  -f  1,  (j  ± 
1)(7  +  1)},  were  replaced  based  on  the  conditional  distribution  of  these  five  boundary 
variables  given  the  variables  at  the  other  sites  and  the  data  y.  Of  course  this 
distribution  is  concentrated  on  the  25  =  32  possible  configurations  for  these  five 
variables. 

Two  update  mechanisms  were  employed:  stochastic  relaxation  and  the  “zero- 
temperature”,  deterministic  variation  discussed  earlier.  In  the  former  case,  the 
updated  binary  quintuple  is  a  sample  from  the  aforementioned  conditional  distri¬ 
bution,  which  varies  depending  on  the  penalty  weight  At  for  the  Uth  sweep  of 
the  lattice  .  Of  course  “0- temperature”  refers  to  replacing  the  sample  by  the 
conditional  mode. 

Constrained  simulated  annealing  was  not  used,  at  least  not  in  accordance  with 
the  formula  in  which  tk  \  0,  At  S  oo  and  At  <  const. -logfc  ( k  =sweep  number). 
Instead,  we  let  At  grow  linearly  and,  in  the  case  of  stochastic  relaxation,  we  fixed 
the  temperature  1 1  at  some  “small”  value. 

Stochastic  relaxation  at  low  temperature  is  more  effective  than  at  zero  tempera¬ 
ture  (essentially  iterative  improvement).  However,  deterministic  relaxation  sufficed 


for  all  but  two  scenes,  the  ice  floes  and  the  four  texture  collage;  these  results  could 
not  be  duplicated  with  deterministic  relaxation.  In  one  case,  we  present  both  results 
for  comparison. 

Generally,  deterministic  relaxation  stabilizes  in  5  to  10  sweeps  whereas  stochastic 
relaxation  requires  more  sweeps,  perhaps  20  to  60.  We  provide  several  pictures 
showing  the  evolution  of  the  algorithm. 

Penalties.  All  the  experiments  were  conducted  with  the  same  forbidden  pat¬ 
terns,  namely  those  in  Figure  12,  with  the  exception  of  the  house  scene,  for  which 
the  last  pattern  was  omitted.  (At  the  resolution  used  for  the  house,  namely  a  =  3, 
the  inclusion  of  that  pattern  would  inhibit  the  formation  of  structures  at  the  scale 
of  six  pixels;  many  such  non-trivial  structures  appear  in  that  scene.)  Thus,  the 
penalty  function  V(x)  records  a  unit  penalty  for  each  occurrence  in  the  boundary 
map  x  =  {xa,  s  €  of  any  of  the  five  patterns  depicted  in  Figure  12.  It  is  in¬ 
teresting  to  note  that  in  no  case  was  the  final  labelling  completely  free  of  penalties, 
i.e.  7(f)  =  0.  Perhaps  this  could  be  achieved  with  a  proper  annealing  schedule,  or 
with  updates  of  more  than  five  sites. 
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Forbidden  patterns 
FIGURE  12 

Measures  of  Disparity.  All  the  experiments  are  based  on  instances  of  the 
measures  (3.3)-(3.5)  described  in  §3. 

(i)  For  the  first  experiment,  the  cart  scene,  the  boundary  resolution  is  a  =  1  and  we 
employed  the  measure  given  in  (3.3)  with  7  =  10A  and  the  raw  difference  |y,.  —  yt.  \ 
modulated  by  the  four  nearest  differences  of  the  same  orientation  as  <  s*,tm  >. 
Thus,  for  the  horizontal  pair  <  s,t  >  of  adjacent  boundary  sites, 


AM(y)  = 


|y»*  -  yr  I 


!0A  +  £*=±i,±2  |y<,;+fc  -  yi+ij+fcl 


where  s*  =  (i,j),  f  =  (f  +  1 , j ) ,  and  A  is  the  mean  absolute  intensity  difference 
over  the  image.  The  utility  seems  largely  impervious  to  the  choice  of  the  scaling 
constant  (here  =  10)  for  the  mean  as  well  as  to  the  range  of  the  modulation. 

(ii)  We  used  the  Kolmogorov-Smirnov  measure  (3.4)  for  both  the  house  and  ice 
floes  scenes.  For  the  house,  we  chose  a  —  3  and  blocks  of  size  25;  the  set-up  is 
depicted  in  Figure  6.  Due  to  the  uniform  character  of  the  background  (e.g.  the 
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sky)  the  distance  (3.4)  was  computed  based  on  the  transformed  data  y'tj  =  yij  +rjij, 
where  { 77,-j- }  are  independent  variables,  and  distributed  with  a  triangular  density, 
specifically  that  of  10(ui  +  1*2)1  uii  u2  uniform  (and  independent)  on  [0, 1]. 

The  boundary  resolution  for  the  radar  experiment  is  a  =  8,  reflecting  the  larger 
important  structures  there;  the  image  is  512  x  512.  The  dynamic  range  is  very 
narrow  and  the  difference  between  the  dark  water  and  somewhat  less  dark  ice  is 
essentially  one  of  texture,  due  in  part  to  the  customary  speckle  noise.  In  particular, 
the  ice  cannot  be  well-differentiated  from  the  water  based  on  shading  alone.  The 
disparity  measure  is  (3.4),  applied  to  the  raw  image  data  over  24  x  24  blocks.  The 
problem  encountered  in  the  house  scene  is  actually  alleviated  by  the  speckle. 

(iii)  The  texture  mosaic  experiments  are  based  on  the  measure  (3.5)  for  a  partic¬ 
ular  family  Ai,...,As  of  five  data  transformations  or  “features”.  In  each  case,  the 
resolution  is  a  —  5  and  block  size  is  21  x  21.  Recall  that  these  five  features  are 
combined  into  a  single  measure  of  change  according  to  the  formula  in  (3.5).  The 
transformations  used  are  the  range 

=  maxtzv.yt  ~  mint&v>yt 

over  a  7  X  7  window  V3  centered  at  pixel  s,  and  the  four  directional  residuals 

=  \y»  —  2(y*+(o,i>  +  y*+(o,-i))l 

—  Ij/j  -  ^(y»+(i,o)  +  y«+(-i,<u)l 
=  |y»  -  ^(y«+(i,i)  +  yj+(-i,-i))l 
=  ly»  -  +  y*+d,-i))l- 

These  residuals  were  then  uniformly  averaged  over  V,,  yielding  the  final  features 
y(l),...y(5). 

It  is  instructive  to  compare  the  Kolmogorov- Smirnov  differences  for  the  raw  and 
transformed  data  over  these  texture  mosaics.  Typically,  jf  one  looks  at  the  resulting 
two  histograms  of  differences  for  a  give  transform,  on  finds  that,  whereas  the  raw 
(Kolmogorov-Smirnov)  differences  are  actually  larger  at  the  texture  borders,  the 
transitions  between  the  borders  and  interiors  axe  sharper  for  the  transformed  data. 
Detecting  the  boundaries  with  the  raw  data  necessitates  an  unacceptable  number 
of  “false  alarms”  in  the  sense  of  interior  “micro-edges”. 

Finally,  the  valuess  of  the  constants  ci,...,cs  used  in  the  construction  of  A,,< 
(see  (3.5))  are  selected  by  restricting  the  percentage  of  false  alarms.  The  details  are 
given  in  the  following  section. 
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Parameter  Selection.  Recall  that  the  total  change  across  the  boundary  seg¬ 
ment  <  s,  t  >  is  measured  by  4>  (a-1  AJ)t(y))  ,  where  <j>  is  given  in  (3.2).  Given  A, 
there  are  two  parameters  to  choose:  a  normalizing  constant  a  and  the  intercept 
/?  =  <£-1(0);  the  “threshold”  for  A  is  then  d*  =  a@. 

For  the  object  boundary  experiments,  namely  the  cart,  house  and  ice  floes, 
the  parameters  a  and  0  were  chosen  as  follows.  Find  the  mean  disparity  over  all 
(vertical  and  horizontal)  values  of  Aijt  for  relevant  bonds  <  s,t  >;  take  a  equal  to 
the  99th  percentile  of  those  above  the  mean  and  d*  equal  to  the  70th  percentile  of 
those  above  the  mean.  This  yields  the  values  a  =  150,  0  =  .28  for  the  cart  scene; 
recall  that  for  this  experiment,  both  the  grid  and  block  sizes  are  unity.  For  the 
house  scene  ( a  =  3)  the  Kolmogorov-Smimov  statistics  were  computed  over  5x5 
blocks,  and  the  resulting  parameters  are  then  a  =  1  and  0  =  .7.  (The  number  of 
distances  at  (the  maximum)  value  A  =  1  was  considerable.)  Finally,  for  the  ice 
floes,  the  recipe  above  yielded  a  =  .33,  0  =  .40. 

Turning  to  the  experiments  with  texture  mosaics,  let  cti*  denote  the  normalizing 
constant  in  (3.5)  for  feature  i,  1  <  i  <  5,  and  texture  k,  1  <  k  <  A",  where  K  is 
the  number  of  textures  in  the  mosaic.  For  each  feature  i  and  texture  type  k ,  we 
computed  the  histogram  of  the  (combined  vertical  and  horizontal)  Kolmogorov- 
Smimov  distances  and  selected  cti*  =  100(1  —  7)  percentile  of  that  histogram. 
Specifically,  we  took  7  =  .01  for  the  two  Brodatz  collages.  (Other  experiments 
indicated  that  any  (small)  value  of  7  will  suffice,  say  0  <  7  <  .03.)  Thus,  100(1  —7) 
percent  of  the  distances  d^'^Dx),  y(,)(f?2))  are  below  c,^  within  each  texture 
type  k.  Now  set  c,-  =  insuring  that  at  most  £7(100)  percent  of 

the  interior  differences  AJ(t(y)  within  the  entire  collage  will  exceed  the  threshold 
d*  =  1.  Finally,  since  a  and  j3  are  then  constrained  by  a/3  =  1,  we  put  a  =  2  and 

0  =  -5. 

Figure  13  (Cart  Scene).  Sixty  sweeps  of  stochastic  relaxation  were  run  with 
ft  =  .05  and  A*  f  3.  Actually,  all  the  boundaries  were  “in  place”  after  about  10 
sweeps,  as  illustrated  in  Figure  13(B),  which  shows  every  third  sweep  up  to  the 
forty-sixth.  Figure  13(C)  shows  the  forty-sixth  sweep  at  larger  scale.  The  image 
is  110  x  110.  Not  shown  is  a  run  with  the  deterministic  algorithm;  the  results  axe 
virtually  indistinguishable. 

Figure  14  (House  Scene).  This  256  ..  256  monochrome  image  was  supplied 
to  us  by  the  VISIONS  group  at  the  University  of  Massachusetts.  The  update  is  by 
deterministic  relaxation  with  A*  increasing  linearly  from  A0  =  0  to  Ajo  =  2. 

Figure  15  (Ice  Floes).  The  image  (15(A))  is  512  x  512.  We  did  sixty  sweeps 
of  stochastic  relaxation  with  =  .1  and  At  /*  2.  Figure  15(B)  shows  sixteen 
“snapshots”  -  every  third  sweep  as  in  Figure  13,  as  well  as  the  final  (60th)  sweep. 

Figure  16  (Brodatz  Collage  1).  The  collage  is  composed  of  nine  Brodatz 
textures  (16(A)):  leather,  grass,  and  pigskin  (top  row),  raffia,  wool,  and  straw  (mid- 


jr,  wood,  and  sand  (bottom  row).  Two  of  the  textures,  leather  and 
id  in  the  two  circles.  The  image  size  is  384  x  384,  the  individual 
128  x  128.  We  show  the  results  (16(B))  of  both  the  deterministic 
tic  (right)  algorithms;  they  are  roughly  comparable.  Other  false 
.005,  .02,  and  .03)  yield  the  same  overall  quality. 


rodatz  Collage  2).  There  are  four  textures  (17(A)):  raffia  (upper 
right),  wool  (bottom),  and  pigskin  (center).  Two  runs  (different 
(17(B)),  individual  frames  representing  every  third  sweep. 
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local  configurations  in  the  pair  ( xb ,  xl),  for  example  “type  1”  errors  (a  boundary 
“between”  like  region  labels)  and  “type  2”  errors  (no  boundary  “between”  unlike 
labels).  The  problem  may  be  that  there  axe  deep  local  minima  which  axe  unfaithful 
to  the  data  but  difficult  to  escape  from,  at  least  without  updating  many  sites. 


7.  SUMMARY 
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We  have  developed  algorithms  for  partitioning  an  image,  possibly  textured, 
into  homogeneous  regions  and  for  locating  boundaries  at  significant  transitions. 
Both  are  based  on  a  scale-dependent  notion  of  disparity,  or  gradient,  and  both 
employ  a  Bayesian  framework  to  make  use  of  prior  beliefs  about  regular  boundary 
or  region  configurations. 

The  disparity  measure  scores  the  difference  between  the  statistical  structures 
of  two  scale-dependent  blocks  of  pixels.  We  have  experimented  with  several  mea¬ 
sures.  Ideally,  the  disparity  will  be  large  when  there  is  an  apparent  difference,  either 
in  grey-level  or  in  texture,  between  the  blocks.  Usually,  it  was  necessary  to  tune 
the  measure  to  the  particular  textures  or  structures  involved;  a  more  universal  mea¬ 
sure  may  require  both  better  preprocessing  (e.g.  first  extracting  reflectance  from 
intensity  [40])  and  better  use  of  “high-level”  information  about  expected  macro¬ 
structures  and  shapes.  For  texture  discrimination,  by  either  partitions  or  boundary 
placement,  we  introduce  a  class  of  features,  or  transformations,  that  are  decid¬ 
edly  multivariate,  depending  on  the  spatial  distribution  of  large  numbers  of  pixel 
grey  levels.  Our  disparity  measure  is  then  a  composite  of  measures  of  differences  in 
the  histograms  of  the  block  data,  under  the  various  transformations.  Low-order  fea¬ 
tures,  such  as  those  derived  solely  from  raw  grey-level  histograms  and  co-occurrence 
matrices,  were  not  as  effective  in  our  framework. 

Disparity  measures  between  pairs  of  pixel  blocks  drive  the  segmentations  or 
boundary  placements  through  a  “label  model”,  that  specifies  likely  label  configura¬ 
tions  conditional  on  disparity  data.  For  partitioning,  labels  axe  generic  and  asso¬ 
ciated  with  local  blocks  of  the  image.  Two  labels  are  the  same  if  their  respective 
regions  are  judged  to  be  instances  of  the  same  texture.  For  boundary  placement, 
the  labels  are  zero  or  one,  and  interpreted  as  indicating,  respectively,  the  absence  or 
presence  of  boundary  elements.  A  priori  knowledge  about  acceptable  label  config¬ 
urations,  which,  for  example,  may  preclude  very  small  or  thin  regions,  or  cluttered 
boundary  elements,  is  applied  by  restricting  labels  to  an  appropriate  subset  of  all 
possible  configurations.  The  result  of  modelling  disparity-label  interactions  and  of 
defining  restricted  configurations,  is  a  Gibbs  distribution  jointly  on  pixel  y  levels 
and  label  configurations,  with  the  marginal  label  distribution  supported  on  a  subset 
of  the  configuration  space. 

Partitioning  and  boundary  finding  is  accomplished  by  approximating  the  max¬ 
imum  a  posteriori  (MAP)  label  configuration,  conditioned  on  observed  pixel  data. 
Because  certain  configurations  Eire  forbidden,  MAP  estimation  amounts  to  con¬ 
strained  optimization.  Stochastic  relaxation  and  simulated  annealing  are  extended 
to  accommodate  constraints  by  introducing  a  non-negative  constraint  function  that 
is  zero  only  for  allowed  label  configurations.  The  constraint  function,  with  a  mul¬ 
tiplicative  constant,  is  added  to  the  posterior  energy,  and  the  constant  is  slowly 
increased  during  relaxation.  Straightforward  calculations  establish  an  upper  bound 
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on  the  rate  of  increase  of  this  multiplicative  constant  that  insures  convergence  of  the 
relaxation  and  annealing  algorithms  to  the  desired  limits.  In  a  series  of  partitioning 
and  boundary-finding  experiments,  deterministic  and  other  feist  variations  of  the 
constrained  relaxation  algorithm  are  found  to  be  effective. 

The  partitioning  model  is  appropriate  when  a  small  number  of  homogeneous 
regions  are  present.  Disjoint  instances  of  a  common  texture  are  automatically  iden¬ 
tified.  The  boundary  model  can  be  effective  in  complex,  multi-textured,  scenes. 
Both  models  sometimes  require  prior  training  to  adjust  parameters. 
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MODELING  VARIATION  TO  ENHANCE  QUALITY  IN  MANUFACTURING 
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I  appreciate  this  opportunity  to  show  you  an  approach  that  we  at 
CPC  have  developed  to  enhance  quality  in  manufacturing  through 
modeling. 

To  illustrate  some  key  ideas  in  this  approach,  I  will  start  with 
a  simple  example:  the  design  of  a  wheelcover. 

I  will  then  go  into  greater  details  the  mathematics  of  this 
approach;  followed  by  yet  another  application  on  a  more 
complicated  example:  the  design  of  a  door  hanging  process  that 
consistently  yields  the  same  door  closing  effort. 

I  will  then  close  with  some  comments. 


ILLUSTRATE  KEY  IDEAS:  DESIGNING 
A  WHEELCOVER  FOR  CONSISTENCY 


MATHEMATICAL  FRAMEWORK 


ANOTHER  APPLICATION:  DESIGNING 
DOOR  HANGING  PROCESS  TO  ACHIEVE 
CONSISTENT  DOOR  CLOSING  EFFORT 


CONCLUSIONS 


This  is  a  typical  wheelcover  (show  wheelcover).  It  is  a  simple 
and  small  part  typical  of  the  thousand  parts  that  go  in  to  make  a 
quality  car.  In  our  business,  the  top  priority  is  customer 
satisfaction.  For  this  wheelcover,  there  are  at  least  two 
features  a  customer  has  come  to  expect  from  it:  ease  of  removal 
if  you  have  to  change  the  tire;  and  good  retention  on  the  wheel 
so  you  won't  loose  it  when  you  hit  a  bump  or  turn  a  corner. 

Depending  on  the  customer,  a  male  or  a  female,  and  on  the  tool 
used  to  remove  the  wheelcover,  the  retention  force  above  which 
the  cover  becomes  difficult  to  remove  will  vary  from  say  30  N, 
easily  removed;  to  60  N,  completely  unremovable  and  therefore 
100%  unsatisfactory. 
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WHEELCOVER  REMOVAL 


PERCENT  OF 

CUSTOMERS 

DISSATISFIED 


On  the  other  hand,  the  retention  force  below  which  the  cover  will 
fall  off  also  depends  on  the  customer  usage  of  the  car.  Cars  on 
bumpy  roads  and  with  sharp  turns  require  a  higher  retention  force 
than  cars  on  freeway  driving.  In  other  words,  customer 
expectation  on  good  retention  will  vary. 
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These  two  competing  requirements  combined  into  a  target  value  of 
retention  force  deemed  most  satisfactory  to  the  customer.  Any 
departure  from  the  target  value  will  incur  some  degree  of 
customer  dissatisfaction  and  potential  loss  of  market  share. 


as 


i  t^tVt  <  ».»  j  tU  »*l  .*>•» 


In  our  business  of  mass  production,  we  will  never  be  on  target 
all  the  time.  Most  assuredly,  we  will  produce  cover  with  a  range 
of  retention  forces  whose  mean  is  off  target  and  a  spread  of 
values  about  the  mean.  The  mean  shift  is  called  the  bias;  and 
the  spread  is  called  the  variance.  Our  tasks  are:  (1)  to  get 
the  mean  on  target  -from  an  engineering  viewpoint,  that  is  not 
difficult-;  and  (2)  to  reduce  the  spread  around  the  mean  -  that 
is  difficult. 

As  a  starting  point,  we  must  identify  what  are  the  factors  or 
variables  that  affect  the  retention  force. 
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This  slidfa  .shows  back  side  of  a  typical  wheelcover.  The  cover 
has  three  clips,  each  with  two  prongs,  spaced  around  the 
circumference  to  form  a  circle.  The  diameter  of  this  circle, 
which  I  call  the  clip  diameter,  is  larger  than  the  diameter  of 
the  rim  on  the  wheel.  So  when  you  press  the  cover  onto  the 
wheel,  the  clip  acts  like  a  spring  and  clicks  onto  the  rim. 
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With  an  understanding  of  the  physics,  we  can  now  deploy  customer 
expectation,  retention  force  on  target  with  minimum  variation,  in 
terms  of  engineering  variables.  There  are  two.  The  first  one  is 
the  difference  between  the  clip  diameter  and  the  rim  diameter. 
Remember  that  the  clip  diameter  is  larger  than  the  rim  diameter. 
The  larger  the  difference  in  diameters,  the  larger  is  the  force 
developed.  In  fact,  the  relationship  is  linear.  The  second 
factor  is  the  stiffness  of  the  clip  which  is  the  slope  of  the 
line.  As  I  mentioned  earlier,  it  is  easy  to  get  the  mean  on 
target.  For  example,  a  difference  in  diameters  of  6.65  mm  and  a 
clip  stiffness  of  5.2  N/mm  will  get  us  there. 
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In  our  business  of  mass  production  however,  there  is  always 
variation.  We  never  get  exactly  6.65  mm  but  a  distribution  of 
values  instead.  This  distribution  projects  into  a  distribution 
of  retention  forces  and  results  in  less  than  satisfied  customer. 
This  is  one  point  we  all  must  realize  —  in  mass  production, 
variation  is  a  fact  of  life.  It  is  the  underlying  cause  of  poor 
quality. 

The  usual  practice  is  to  control  the  variation  by  tightening  the 
tolerance,  say  by  sorting  large  covers  to  match  with  large  rims 
and  small  covers  with  small  rims.  That  of  course,  is  expensive. 
So  we  achieve  quality  at  extra  cost,  hoping  to  recover  that  cost 
through  warranty  cost  reduction  and  improved  customer 
satisfaction. 
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There  is  however,  another  approach.  Instead  of  trying  to  sort 
the  covers  into  large  and  small,  we  could  choose  less  stiff  clips 
spaced  at  a  larger  diameter.  As  you  can  see,  with  this  choice, 
we  can  achieve  the  same  quality  with  no  cost  because  we  do  not 
have  to  tighten  the  tolerance.  This  is  the  Taguchi  concept  of 
insensitive  design.  It  says:  do  not  fight  variation  head  on. 
Instead,  make  your  design  less  sensitive  to  the  variation. 

At  this  stage,  the  warranty  cost  comes  down  and  the  customer  is 
satisfied.  But  we,  as  engineers,  should  not  be  satisfied. 

Because  . . . 
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by  choosing  still  weaker  clips  and  a  larger  clip  diameter,  we  can 
open  up  the  tolerance  and  still  arrive  at  the  same  quality.  Now 
the  cost  really  comes  down.  We  can  go  with  less  expensive 
suppliers;  our  processes  can  be  less  precise;  and  the  labor  need 
not  be  skilled.  In  other  words,  we  now  achieve  quality  at  extra 
profit;  or  "better  for  less".  That  -better  for  less-  is  the 
primary  motivation  -behind  the  concept  of  insensitive  design. 
Warranty  cost  reduction  is  only  a  secondary  by  product.  It  comes 
naturally  when  we  do  our  design  right. 

1 

Since  better  for  less  is  our  goal,  our  approach  to 
process/product  design  must  change.  In  this  problem  for 
example,  the  usual  practice  is  for  a  design  engineer  to  decide 
on  the  nominal  clip  diameter  and  stiffness  to  get  the  nominal 
retention  force  on  target;  and  the  manufacturing  people  would 
then  decide  what  tolerances  should  go  with  them.  Herein  lies  the 
crux  of  the  problem:  when  the  design  engineer  specifies  the 
nominal  clip  diameter  and  stiffness,  he  already  fixes  the 
sensitivity  of  the  wheelcover  to  the  variation  of  mass 
production.  The  only  mean  left  to  the  manufacturing  people  to 
reduce  variation  in  retention  force,  if  he  wants  to,  is  to 
tighten  the  tolerances  of  the  clip  diameter  and  stiffness,  which 
as  I  said  earlier  is  expensive.  The  more  cost  effective  approach 
is  the  other  way  around.  Go  first  to  the  manufacturing  plant  and 
negotiate  for  the  most  cost  effective  tolerance;  and  then  come 
back  and  decide  what  the  nominal  clip  diameter  and  stiffness 
should  be  to  arrive  at  the  quality  we  want. 

To  summarize,  what  we  have  here  is  a  miniature  QFD,  Quality 
Function  Deployment,  in  which  we  have  deployed  customer 
expectation  in  terms  of  product/process  variables.  Now  we  can 
explore  very  early  on  in  the  design  phase,  the  different 
product/process  design  alternatives  that  will  produce  quality 
product. 
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This  sketch  illustrates  what  we  are  trying  to  do.  In  on-line 
quality  control,  we  try  to  inspect  quality  into  the  product 
through  SPC  and  related  statistical  tools.  I  think  that  is  too 
late  and  too  expensive.  In  Taguchi  method  and  related  design  of 
experiment  methodology,  we  attempt  to  figure  out  early  in  the 
development  phase  what  factors  adversely  affect  quality  and 
dissensitize  the  design  against  these  factors.  This  is  a  big 
step  forward.  But  I  think  that  also  is  too  expensive  and  too 
late.  What  I  propose  is  to  carry  the  activities  further  upfront 
in  the  design  phase.  Only  then  can  we  achieve  the  greatest 
impact . 

Let  me  show  you  then  the  mathematical  framework  for  carrying  out 
this  strategy. 


Let's  go  back  to  the  wheelcover  problem.  In  the  neighborhood  of 
the  target  x ,  we  may  approximate  customer  dissatisfaction  as  a 
quadratic  function  of  the  retention  force  f,  about  the  target. 
Assuming  loss  to  be  directly  proportional  to  customer 
dissatisfaction,  the  fraction  p ( f )  of  wheelcover  population  with 
retention  force  f  that  deviate  from  the  target  value  entails  a 
loss  proportional  to 


(f-x)  p(f)df 


Summing  up  this  loss  over  the  range  of  f,  we  have  loss  directly 
proportional  to  the  mean  squared  error  (MSE)  of  f. 


Loss  ot 


f  (f-x 


)  p(f)df  =  MSE(f) 
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The  mean  squared  error  (MSE)  can  further  be  decomposed  into  the 
square  of  the  bias,  a  measure  of  how  far  the  mean  is  off  target, 
and  variance,  the  spread  of  the  response  from  its  mean.  Our  aim 
is  to  get  the  mean  on  target  and  the  variance,  a  minimum.  The 
procedure  of  searching  for  design  with  this  property,  Taguchi 
called  it  parameter  design.  In  traditional  optimization,  it  is 
called  equality  constraint  optimization.  So  what  I  am  doing  is 
just  transferring  technology.  Whereas  Taguchi  method  implements 
the  concept  in  the  development  phase  through  design  of  experiment 
in  the  lab,  I  would  implement  the  same  concept  in  the  design 
phase  through  optimization  in  the  computer. 
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The  basic  requirement  is  that  we  must  know  the  relation  between 
the  response  f  and  the  factors  x  which  control  or  affect  f.  In 
the  case  of  the  wheelcover,  the  response  is  the  retention  force, 
and  the  factors  are  the  clip  stiffness  x^,  the  clip  diameter  x2 
and  the  rim  diameter  x^.  The  key  is  to  expand  the  response 
function  about  the  nominal  values  of  the  factors.  You  can  then 
derive  in  closed  form  the  bias  and  variance  of  the  response  in 
terms  of  the  nominal  values  of  the  factors  x.  In  these 
equations,  gi  is  the  gradient  of  f  with  respect  to  the  factor 
hij  the  hess^an  f  with  respect  to  x^,  x j ;  is  the  mean  or 

nominal  value  of  x^;  and  is  the  variance-covariance  matrix  of 

xi'  xj* 

These  equations  relating  bias  and  variance  to  factors  x  are  then 
submitted  to  an  optimizer  to  search  for  nominal  values  of  x  that 
ensure  the  response  is  on  target  and  with  minimum  variance. 


I  would  like  to  point  out  again  the  difference  between  the 
traditional  design  and  the  variation  minimum  design. 
Traditionally,  the  design  engineer  takes  a  deterministic  approach 
and  uses  the  bias  relation  to  find  the  se"  of  nominal  values  u 
that  ensures  the  response  is  on  target.  In  so  doing,  he  fixes 
the  sensitivity  g^(u)  of  the  design  to  the  variation  of  mass 
production.  The  manufacturing  engineer,  if  he  wants  to  improve 
the  quality,  has  no  other  recourse  but  to  tighten  the  tolerance, 
Oij  which  generally  is  expensive.  By  contrast  in  variation 
minimum  design,  one  tries  to  find  the  set  of  nominal  values  that 
ensures  not  only  the  response  is  on  target  but  also  the  variance 
is  a  minimum.  And  so  quality  is  achieved  at  no  cost  because  you 
don’t  have  to  tighten  the  tolerance.  Indeed,  it  is  possible  to 
find  a  set  of  nominal  /alues  that  even  allows  you  to  open  up  the 
tolerance.  At  that  stage,  we  attain  the  ’better  for  less’ 
situation. 
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Let  me  demonstrate  how  this  approach  is  implemented  in  another 
real  life  example:  design  a  door  hanging  process  that  yields 
consistent  door  closing  effort. 
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AN  EXAMPLE 

Design  a  door  hanging  process  that 
yields  consistent  door  closing  effort. 


Door  closing  effort  in  car  is  spent  mostly  in  overcoming  air 
resistance.  Thus,  much  less  effort  is  needed  to  close  a  door 
with  the  window  down  than  with  the  window  up..  We  can  reduce  air 
resistance  by  closing  the  door  slowly.  However,  the  door  needs 
to  attain  a  certain  velocity  at  closing  for  it  to  have  the 
necessary  amount  of  kinetic  energy  to  compress  the  weatherstrip 
and  effect  a  seal.  The  primary  variable  associated  with  door 
closing  effort  therefore,  is  the  energy  stored  in  the  deformed 
weatherstrip.  A  reduction  in  stored  energy  (SE)  means  a  reduced 
door  velocity  needed  at  closing  which  translates  to  a  dramatic 
decrease  in  air  resistance  and  door  closing  effort. 

With  the  door  closed,  the  weatherstrip  is  compressed  between  the 
body  and  the  door  around  the  door  periphery  as  shown  in  this 
figure.  The  car  to  car  variation  in  SE  comes  from  the  car  to  car 
variation  in  the  diameter  of  the  weatherstrip  and  in  the  gap 
between  the  door  and  the  body.  In  turn,  the  variation  in  the  gap 
comes  from  the  variation  in  the  build  of  both -the  body  and  the 
door  and  in  the  positioning  of  the  door  with  respect  to  the  body 
during  hanging.  For  purpose  of  illustration,  we  consider  only 
the  variation  due  to  hanging. 
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Take,  for  example,  one  car  line  currently  being  assembled.  The 
door  is  positioned  with  respect  to  the  door  opening  by  locating 
three  points  (1,2,3)  on  the  door  at  869.26,  876.87  and  813.00  mm 
respectively,  from  the  centerline  of  the  car  in  the  cross  car 
direction  as  shown  in  this  figure.  These  points  correspond  to 
the  hinges  and  latch  locations  on  the  door.  Once  the  door  is 
positioned,  hinges  are  screwed  on  to  the  door  and  the  pillar. 
Suppose  in  positioning  the  door,  the  points  (1,2,3)  are  in  error 
by  no  more  than  0.5  mm.  How  much  deviation  from  the  nominal 
value  of  the  door  closing  effort  do  these  errors  produce?  For 
the  same  tolerance  of  0.5  mm,  is  there  a  trio  of  nominal  cross 
car  positions  y^Cu^ ,u>2 '^3 1  t^ie  P°^-nts  ( 1*2,3)  at  which 
positions  the  door  closing  effort  is  the  least  sensitive  to  the 
errors  in  positioning? 
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For  example,  this  figure  is  a  plot  of  the  equation  depicting  how 
SE  varies  with  x^,  the  position  of  the  upper  hinge.  The  lower 
hinge  and  the  latch  are  fixed  at  their  current  nominal  positions. 
The  sensitivity  of  SE  to  hanging  is  now  apparent.  With  x^  at  its 
current  nominal  position  of  869.26  mm,  a  1.0  mm  deviation  in  x1 
produces  a  deviation  in  SE  of  about  0.25  N-m.  By  contrast,  with 
x^  at  say  873.00  mm,  the  same  amount  of  deviation  in  x^  produces 
only  a  deviation  in  SE  of  about  0.1  N-m.  It  is  this  potential 
for  desensitizing  a  design  to  the  variation  in  manufacturing  that 
we  try  to  exploit:  instead  of  trying  to  tighten  the  1.0  mm 
deviation  to  a  smaller  value,  we  reconfigure  the  design, 
positioning  x1  at  873  mm,  to  render  the  design  insensitive  to 
variation. 


STORED  ENERGY 
(CLOSING  EFFORT) 
N-m 


Stored  energy  yersus  upper  hinge  location. 
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To  analyze  this  problem,  we  first  derive  from  a  physical 
consideration  the  response  SE  as  a  function  of  the  factors 
x=[x1,X2,x3 ] ,  the  cross  car  position  of  points  (1,2,3).  In  a 
diametral  compression,  the  weatherstrip  exhibits  a  linear 
force-deflection  relationship.  Therefore,  for  a  weatherstrip  of 
diametral  stiffness  K,  diameter  D  and  length  L,  the  function 
SE(X)  is: 

SE(X)  =-j|k[D-G(s,X)  ]2ds  ,  for  D  >  G; 

where  G  is  the  gap  which  varies  along  the  door  periphery  s. 

We  then  discretized  the  weatherstrip  into  15  segments  .  The 
integral  may  now  be  approximated  by  summation,  and  the  SE(x)  for 
a  given  x  may  be  computed: 

,  2 

SE(x)  =k  E  [D-G.  (x)  ]%  ; 

l=t  x  x 


where  nominally,  K  =  0.012  N/mm/mm,  D  =  19mm. 

The  above  equation  completely  describes  the  relationship  between 
the  response  SE  and  the  factors  x. 
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STORED  ENERGY  (SE)  IN  WEATHERSTRIP 


SE(x) 


=  hfi 


K[D-G( s  ,x)  ]  ds  , 


=  h K  L  [D-G. (x) ]ZL. 
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We  now  exploit  the  full  potential  by  considering  simultaneously 
all  three  nominal  positions  u=[u1  ,y>2  ,u,  ]  of  x.  Avoiding  drastic 
departure  from  the  current  positions  4  -[869.26,876.87,813.00] , 
we  search  within  a  2  mm  neighborhood  for  a  position  4*  at  which 
the  mean  value  of  SE(4*)  is  the  same  as  that  of  the  current;  and 
the  variance  of  SE(u*)  is  minimized.  Cast  in  the  context  of 
optimization,  we  submit  the  problem  to  a  standard  optimization 
routine  and  found  the  solution  4*  =  [870.27,874.87,811.00].  The 
result  is  shown  in  this  figure.  For  the  same  0.5  mm  tolerance 
allowed  in  4,  the  variance  of  SE  at  optimum  position  is  only  a 
third  of  that  at  current  position. 

The  significance  of  this  example  is  not  so  much  in  the  results 
but  in  the  fact  that  insensitive  design  concept  can  be  integrated 
into  CAD/CAM  environment  thus  permits  a  widespread,  early  and 
upfront  implementation  of  the  concept. 
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To  summarize,  let  me  review  these  key  points: 

•  that  in  mass  production,  variation  is  a  fact  of  life; 

•  that  we  should  not  try  to  fight  variation  head  on,  but 
instead  make  our  design  insensitive  to  it; 

•  that  in  doing  so,  our  primary  motive  is  'better  for  less'; 

•  and  finally,  that  a  framework  has  been  developed  which 
relates  customer  expectation  to  product/process  design;  and 
allows  a  widespread,  early  and  upfront  implementation. 
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CONCLUSIONS 


VARIATION  IS  HERE  FOREVER; 
DESENSITIZE  THE  DESIGN; 

PRIMARY  MOTIVE  IS  "BETTER  FOR  LESS"; 
FRAMEWORK  DEVELOPED  WHICH 

♦  RELATES  CUSTOMER  EXPECTATION 

• 


TO  PRODUCT/PROCESS  VARIABLES; 

ALLOWS  INTEGRATION  INTO  CAD/CAM  &  THUS 
WIDESPREAD  AND  UPFRONT  IMPLEMENTATION, 


Ladies  and  gentlemen,  in  our  business  of  mass  production  we  look 
upon  variation  as  an  evil  because  it  is  the  root  cause  of  poor 
quality  and  unreliability.  There  is  however,  one  thing  good 
about  variation — it  is  blind.  It  does  not  discriminate  Ford  from 
GM  from  Toyota.  It  affects  everyone  and  exempts  no  one.  For 
that  reason  we  should  view  variation  not  as  a  problem,  but  as  an 
opportunity  to  use  it  to  our  advantage  and  gain  a  competitive 
edge  over  our  competitions. 
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BACKGROUND  ARTICLE 


VARIATION  TOLERANT  DESIGN 

H.  L.  Oh 
C-h*C  Group 

GtntH  Moron  Corporttron 
Wtntn,  Michigan 


ABSTRACT 

Taguchi  philosophy  of  robust  design  is  formulated  in  the 
context  of  optimization.  In  particular,  the  mean  squared  error  of 
a  design  performance  is  derived  explicitly  in  terms  of  the  design 
parameters.  This  permits  the  strategic  choice  of  the  parameters 
that  minimize  the  mean  squared  error  of  the  performance  early  at 
the  design  phase.  Thus,  Taguchi  philosophy  will  achieve  an  even 
greater  impact  on  quality  improvement  as  its  implementation  is 
shifted  from  the  usual  domain  of  experimental  development  to  the 
early  phase  of  analytical  design. 

A  real  life  problem  is  used  to  illustrate  the  implementation 
of  the  formulation:  design  a  door  hanging  process  in  car  assembly 
that  yields  consistent  door  closing  effort. 

INTRODUCTION 

Quality  improvement  activities  achieve  their  maximum  impact 
when  they  are  carried  out  up  front  in  the  product  realization 
process.  Thus,  the  Taguchi  philosophy  of  shifting  quality  control 
from  assembly  line  inspection  to  pre-production  experimentation  is 
a  significant  step  in  this  direction.  Another  contribution  of 
Taguchi  is  the  philosophy  of  robust  design:  do  not  try  to  control 
the  sources  of  variation  affecting  the  design;  make  the  design 
insensitive  to  these  sources  instead. 

To  implement  the  Taguchi  philosophy,  one  usually  employs  a 
planned  experimental  program  to  acquire  knowledge  about:  (1)  the 
relationship  between  the  design  response  and  the  factors  affecting 
the  response;  and  (2)  the  variability  in  the  design  response  caused 
by  the  variability  in  the  factors.  Based  on  the  knowledge  (1) 
acquired,  one  then  sets  the  factors  to  some  nominal  values  such 
that  the  average  value  of  the  design  response  equals  a  target  value 
while  the  variance  of  the  design  response  in  (2)  is  at  a  minimum. 

In  this  way,  one  has  achieved  a  variation  minimum  or  variation 
tolerant  design.  A  variation  tolerant  design  may  not  achieve 
optimal  performance  level;  but  it  will  have  performance 
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consistency.  It  is  the  letter  that  is  to  be  emphasized  to  achieve 
robust  design.1 

The  strategy  as  described  above  is  well  known  in  the  field  of 
optimization  and  is  called  equality  constraint  minimization  II].2 3 
If  the  relationship  between  the  response  and  the  factors  is  known, 
then  as  pointed  out  in  12.3],  variation  tolerant  design  can  be 
accomplished  by  implementing  the  Taguchi  philosophy  solely  through 
numerical  optimization.  No  experimentation  is  needed.  This  is  the 
motivation  of  this  paper:  to  show  that  traditional  optimization 
techniques  can  be  applied  in  the  design  phase  for  the  design  of 
performance  consistency,  as  opposed  to  Taguchi  method  which  tries 
to  achieve  performance  consistency  in  the  development  phase  through 
the  design  of  experiments. 

There  are  several  compelling  reasons  to  implement  the  Taguchi 
philosophy  of  robust  design  through  numerical  optimization.  First, 
greater  impact  can  be  achieved  since  we  are  taking  another  step 
further  up  front;  i.e.,  from  experimental  development  to  analytical 
design.  Second,  there  already  exist  in  the  engineering  sciences,  a 
vast  amount  of  knowledge  relating  design  response  to  design 
factors.  These  relationships  are  either  known  or  can  be  derived 
through  a  simple  application'  of  the  physical  laws  or  a  sophisti¬ 
cated  modelling  such  as  finite  element  modelling.  By  tapping  into 
this  existing  knowledge  and  evoking  Taguchi  strategy  through 
optimization,  more  design  alternatives  can  be  explored  in  a  shorter 
time;  and  much  variation  tolerant  design  can  be  achieved  with 
little  experimentation.  Finally,  with  more  and  more  products  and 
processes  now  being  designed  with  computer-aided  engineering  (CAB), 
using  optimization  to  implement  the  Taguchi  philosophy  permits  the 
integration  of  the  Taguchi  philosophy  into  CAE  and  allows  the  full 
realization  of  computer-aided  robust  design.  In  the  next  section, 
we  devise  a  method  which  permits  the  implementation  of  the  Taguchi 
philosophy  of  robust  design  in  the  context  of  optimization.  In  the 
last  section,  we  use  a  real  life  example  to  illustrate  the 
implementation:  design  a  door  hanging  process  in  car  assembly  that 
yields  consistent  door  closing  effort. 

FORMULATION 

Let  f  denote  the  value  of  the  design  response  of  interest  and 
t,  its  target.  As  mentioned  earlier,  the  dependency  of  f  on  the 
design  factors  x*lx  ,x-,...,x  ]J  is  either  known  to  us  or  can  be 
derived.  Because  of  the  variability  in  x,  f  would  exhibit  a  random 
deviation  from  t.  Let  the  mean  squared  error  (MSB)  be  a  measure  of 
this  deviation.  Then, 


MSS(fl  »  E({f-T)‘] 


■  B([f-E(f )}2}  +  [Elf  >-T]2. 


1 Author  acknowledges  these  terae  statements  by  one  discusser* 

^Numbers  in  bracket  refer  to  references  at  the  end  of  the  text. 

3Sold  symbols  are  vector  quantities  unless  otherwise  specified. 
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The  symbol  I  denotes  expectation,  or  averaging  over  x: 


*(•)  - 


/<  • >P< 


z)dx  t 


where  p(x)  is  the  joint  probability  density  function  of  the  design 
factors  x.  On  the  right  hand  side  of  Eg  ( 1 > ,  the  first  tern  is  the 
variance  of  f.  This  is  a  measure  of  the  variation  of  f  from  its 
mean.  The  second  term  is  the  bias  squared.  This  is  a  measure  of 
the  deviation  of  the  mean  value  of  f  from  the  target.  Zt  is  a 
fixed  error  affecting  every  design.  Our  aim  is  to  minimize  MSE(f) 
by  a  choice  of  the  mean  values  u«tu,  ,u, ,  ...,  u  ]  of  x  that: 

1  «  Cl 


ensures  zero  bias 


*(f)-x  -  0: 


and  minimizes  variance  *{[f-E(f)J  }. 

Taguchi  called  the  above  strategy,  parameter  design.  In 
optimization,  it  is  called  equality  constraint  minimization.  If 
p(x)  is  known,  e.g. ,  normal  distribution  with  known  standard 
deviations  but  with  the  means  as  parameters  yet  to  be  determined, 
the  above  strategy  could  be  carried  out  numerically.  The 
computation,  however,  is  simplified  considerably  and  no  knowledge 
of  p(x)  is  required  if  we  expand  f(x)  about  u  in  Taylor  series  and 
neglect  terms  of  order  three  or  higher: 

fCx)  »  ftvl*  £  «<(u)(x,-u<)  ♦  i!  £  h. .(uMx.-u.  ilx.-u. )  (2) 

W  1  1  1  i«IJ«l  i  i.  j  3 

where  g  (ul ,  h  .(u!  are  the  partial  derivatives  df/dx.,  d2f/dx.dx 
evaluated  at  u*fu, ,  u, ,...,  u  ].  Since  E(x. )*u, ,  the  mean  value  of 
f  is  1  2  n 

E( f )  »  f(u)  1  it  t  h. .(ulo, (3) 
i-i  j-i 

and  the  bias  and  variance  of  f  may  be  computed  from  Eqs  (2,3)  as 
follows: 

bias  (f)  *  E(f )-T  ■  f(u>  vt  +  jC  I  h,  4  (u)o,  . ;  (4) 

var  (f)  -  *{(f-E(f)]2)  *  B(IE  g,  (uMx.-u,  )]2i 

i«l  i  11 

*  E  Eg  (H)g^(u )o  •  (5) 

3«1  i»l  1  3 

In  above  equations,  ■©  .  denotes  the  variance-covariance  of  (x  ,x  I 
given  by  3  15 


°ij  *  “^V^KXj-Uj'J 

*  var  (x^ I ,  for  i  *  j ; 

*  cov  (x^.Xj),  for  i  *  J. 

Several  philosophical  points  about  variation  tolerant  design 
are  evident  from  Eqs  (4,S).  In  product  or  process  design,  the 
usual  practice  is  to  choose  a  set  of  w  such  that  the  response  f  is 
on  target.  This  is  usually  done  based  on  deterministic  calcula¬ 
tions.  Once  the  choice  is  made,  the  gradient  g.(u)  in  Eq  (5)  is 


fixed.  This  fixes  the  verisnee  of  f.  If  the  variance  of  f  is 
unacceptable,  the  design  engineer  would  then  attempt  to  reduce  it 
by  tightening  a  . ,  the  variance-covariances  of  the  design  factors 
m  Eq  (5).  This^approach  generally  requires  more  precise  process¬ 
ing  and  skillful  labor.  It  is  a  more  expensive  approach.  Thus, 
quality  is  achieved  at  extra  cost.  Hopefully,  this  extra  cost  will 
be  recovered  in  the  form  of  reduced  warranty  cost  and  improved 
customer  satisfaction. 

By  contrast,  variation  tolerant  design  attempts  to  find  a  set  of  u 
not  only  to  ensure  that  the  response  is  on  target  but  also  to 
guarantee  that  the  gradient  g.(u>  in  Eq  (5)  is  a  small  value.  This 
would  reduce  the  variability  in  the  response  f  at  no  extra  cost 
since  there  is  no  attempt  to  tighten  a...  Therefore,  quality  is 
achieved  at  no  additional  cost.  Indeed;  there  exists  the  possi¬ 
bility  of  finding  a  set  of  u  which  reduces  g^iu)  to  such  a  low 
level  that  a  may  be  opened  up  without  increasing  the  variability 
in  the  respoAde  f.  In  this  case,  manufacturing  becomes  less  costly 
and  quality  is  achieved  at  extra  profit.  In  short,  the  primary 
motivation  in  variation  tolerant  design  is  profit,  warranty  cost 
reduction  and  improved  customer  satisfaction  are  secondary 
by-products . 

A  related  point  to  the  above  discussion  is  the  following. 

While  the  usual  design  practice  is  to  first  set  nominal  values  of 
the  design  factors  based  on  deterministic  calculations  and  then  let 
the  design  engineer  and  the  production  people  negotiate  on  what  the 
tolerances  of  these  factors  should  be,  the  more  profitable  approach 
is  to  first  set  the  tolerances  of  these  factors  at  values  that 
minimize  the  cost  and  then  implement  the  variation  tolerant  design 
to  choose  the  nominal  values  that  ensure  design  response  on  target 
at  minimum  variability.  As  a  corollary,  the  design  engineer  must 
know  a  . ,  the  capability  of  manufacturing  before  he  can  carry  out  a 
variation  tolerant  design. 

AN  EXAMPLE 

Door  closing  effort  in  car  is  spent  mostly  in  overcoming  air 
resistance.  Thus,  much  less  effort  is  needed  to  close  a  door  with 
the  window  down  than  with  the  window  up.  We  can  reduce  air  resist¬ 
ance  by  closing  the  door  slowly.  However,  the  door  needs  to  attain 
a  certain  velocity  at  closing  for  it  to  have  the  necessary  amount 
of  kinetic  energy  to  compress  the  weatherstrip  and  effect  a  seal. 
The  primary  variable  associated  with  door  closing  effort, 
therefore,  is  the  energy  stored  in  the  deformed  weatherstrip.  A 
reduction  in  stored  energy  (SE)  means  a  reduced  door  velocity 
needed  at  closing  which  translates  to  a  dramatic  decrease  in  air 
resistance  and  door  closing  effort. 

with  the  door  closed,  the  weatherstrip  is  compressed  between 
the  body  and  the  door  around  the  door  periphery.  Figure  1.  The  car 
to  car  variation  in  SE  comes  from  the  car  to  car  variation  in  the 
diameter  of  the  weatherstrip  and  in  the  gap  between  the  door  and 
the  body.  In  turn,  the  variation  in  the  gap  comes  from  the  vari¬ 
ation  in  the  build  of  both  the  body  and  the  door  and  in  the 
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Figure  1.  Gar  'A'  around  the  door  rerirhery. 


positioning  of  the  door  with  respect  to  the  body  during  hanging. 

For  purpose  of  illustration,  we  consider  only  the  variation  due  to 
hanging. 

Take,  for  eAumple,  one  car  line  currently  being  assembled. 

The  door  is  positioned  with  respect  to  the  door  opening  by  locating 
three  points  (1,2,3)  on  the  door  at  869.26,  876.87  and  813.00  mm, 
respectively,  from  the  centerline  of  the  car  in  the  cross  car 
direction.  Figure  2.  These  points  correspond  to  the  hinges  and 
latch  locations  on  .the  door.  Once  the  door  is  positioned,  hinges 
are  screwed  on  to  the  door  and  the  pillar.  Suppose  in  positioning 
the  door,  the  points  (1,2,3)  are  in  error  by  no  more  than  0.5  mm. 
How  much  deviation  from  the  nominal  value  of  the  door  closing 
effort  do  these  errors  produce?  For  the  same  tolerance  of  0.5  mm, 
is  there  a  trio  of  nominal  cross  car  positions  u*[u.,u,,u.]  of  the 
points  (1,2,3)  at  which  positions  the  door  closing  effort^is  the 
least  sensitive  to  the  errors  in  positioning? 
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Figure  2.  Positioning  of  the  door  to  the  opening 


To  analyze  this  problem,  we  first  derive  from  a  physical 
consideration  the  response  SE  as  a  function  of  the  factors 


x«(x1 ,x, ,x. ] ,  the  cross  car  position  of  points  (1,2,3).  In  a 
diametral  compression,  the  weatherstrip  exhibits  a  linear 


force-deflection  relationship.  Therefore,  for  a  weatherstrip  of 
diametral  stiffness  K,  diameter  D  and  length  L,  the  function  SE(x) 


SE( x)  «  j/kCD-G(s,x)  l2ds  ,  for  D  >  G;  (6) 

<K 

where  G  is  the  gap  which  varies  along  the  door  periphery  s.  Since 
only  small  rotations  are  involved  in  positioning  the  door,  the  gap 
would  be  linearly  proportional  to  x.  Therefore  upon  integration, 
the  SE(x)  will  be  a  quadratic  function  of  x: 


SE(*.  -  aQ 


The  coefficients  aQ,  bx,  and  c^  are  estimated  as  follows. 
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Cartesian  coordinates  of  two  sets  of  points  are  digitized  from 
the  blueprints.  One  set  of  15  points  is  on  the  body  spaced  around 
the  door  opening  at  locations  indicated  by  the  arrows,  Figure  2. 

The  other  set  of  equal  number  of  points  is  on  the  door  directly 
opposite  to  tht  1  j  points  on  the  body.  The  set  on  the  body  is  held 
fixed  while  the  set  on  the  door  moves  with  the  locating  points 
(1,2,3)  as  a  rigid  body  as  the  locating  points  are  positioned  to  an 
x  value.  The  coordinates  of  the  ith  point  on  the  door  as  moved  can 
be  computed  from  its  initial  coordinates  and  the  x  values.  There¬ 
fore,  the  distance  between  the  ith  point  on  the  door  and  the 
opposing  point  on  the  body  can  be  computed.  This  distance 
represents  the  gap  G  at  that  location  of  the  door  periphery. 
Approximating  the  integral  in  Eq  (6)  by  summation,  the  SE(x)  for  a 
given  x  may  be  computed: 

18  , 

SE(x)  -  Ike  [d-g.  <x)]%  :  (8) 

1*4  1  1 

where  nominally,  K  *  0.012  N/mm/mm,  D  *  19mm,  and  L.  is  the  length 
of  the  weatherstrip  between  two  points:  the  midpoint  of  points 
(i-11  and  it  and  that  of  points  i  and  (i+1). 

Using  Eq  (8).  the  SE  values  for  twenty  seven  sets  of  x, 
covering  the  realistic  ranges  of  x,  are  generated.  These  are  shown 
m  Table  I.  Using  Eq  (7)  and  data  in  Table  I,  a  regression  of  SE 
on  x  is  then  made  to  estimate  the  coefficients  a.,  b. ,  and  c  . . 

The  results  are  shown  in  Table  ZZ.  These  coefficients,  together 
with  Eq  (71,  completely  describe  Che  relationship  between  the 
response  SE  and  the  factors  x. 

STORED  ENERGY  ,N-« 

(CLOSING  EFFORT) 


865  867  869  871  873  875 

LOCATION  OF  POINT  (1),  m 


Figure  3.  Stores  energy  versus  upper  hinge  location. 


For  example,  Figure  3  is  a  plot  of  Eg  (7)  depicting  how  SE 
varies  with  x..  the  position  of  the  upper  hinge.  The  lower  hinge 
and  the  latehiare  fixed  at  their  current  nominal  positions.  The 
sensitivity  of  SE  to  hanging  is  now  apparent,  with  x.  at  its 
current  nominal  position  of  669.26  mm,  a  1.0  mm  deviation  in  x, 
produces  a  deviation  in  SE  of  about  0.25  N~m.  By  contrast,  with  x 
at  say  673.00  on,  the  same  amount  of  deviation  in  x.  produces  only1 
a  deviation  in  SE  of  about  0.1  tf-m.  Zt  is  this  potential  for 
desensitizing  a  design  to  sources  of  variation  that  we  try  to 
exploit:  instead  of  trying  to  tighten  the  1.0  ram  deviation  to  a 
smaller  value,  we  reconfigure  the  design,  positioning  x,  at  873.00 
mm,  to  render  the  design  insensitive  to  variation. 

We  now  exploit  the  full  potential  by  considering  simultane¬ 
ously  all  three  nominal  positions  ue[u, ,u, of  x.  Avoiding 
drastic  departure  from  the  current  positions'3 

u*( 869. 26, 876. 87, 813. 00 ] ,  we  search  within  a  2  mm  neighborhood  for 
a  position  u*  at  which  the  mean  value  of  SE(u*)  is  the  same  as  that 
of  the  current;  and  the  variance  of  SE(u*)  is  the  least.  Assuming 
no  correlations  among  the  factors,  we  have: 


(0.5/3)* 


for  i  »  1,  2, 
for  i  *  j  . 


Substituting  the  values  of  o  ,  into  Eqs  (4,5,7)  and  simpli¬ 
fying,  we  have  for  a  given  u,  the  Dias  and  variance  of  SE(u>  as 
follows: 


bias  [SE(tt)J  *  EtSEluH  -  EtSEtU  >] 


var  (SE(u) ] 
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cest  in  the  context  of  optimization,  we  submit  the  following 
problem  to  a  standard  optimization  routine:  search  for  u*  that 

7  3  2 

minimizes  £(b,  ♦  c,,u.  ♦  Ec  .u  )  o  , ; 

i-i  i  il  i  i)  3  ll 

and  satisfies  J?  b.  (u4-uci  ♦  £  i  c.  .  lu  u.  -u'ju1;  i  =  0; 

l-ii  l  i  3-11-113  13  13 

c  *  c 

u  -2  mm  <  u  <  u  +2  mm. 


and  satisfies 


The  solution  is  U*  *  [870.27,874.87,811.00]. 
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Figure  4  *re  histograms  which  display  tha  results  of  Monte 
Carlo  simulations  of  the  door  hanging  process  using  the  current  uc 
and  the  optimum  u*  positions  of  x.  For  the  same  0.5  mm  tolerance 
allowed  in  u.  the  variance  of  SE  at  optimum  position  is  only  a 
third  of  that  at  current  position. 
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Figure  <i.  Histograms  or  stored  energy  (closing  effort)  for 

DOOR  HANGED  AT  CURRENT  AND  AT  OPTIMUM  POSITION. 
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table  I 


Computed  Stored  Energy  (SE) 
For  Various  x  =  [x^x^j 

X2  x3 

(m>  (m) 


(m) 

0.86926 

0.86926 

0.869-26 

0.86926 

0.86926 

0.86926 

0.86926 

0.86926 

0.86926 

0.87226 

0.87226 

0.67226 

0.87226 

0.87226 

0.87226 

0.87226 

0.87226 

0.87126 

0.86526 

0.86526 

0.86526 

0.86526 

0.86526 

0.86526 

0.86526 

0.86526 

0.86526 


0 . 87687 

0 . 87687 

0.87687 

0.88087 

0.88087 

0.88087 

0.87587 

0. 87287 

0.87587 

'■'■87687 

0. 87687 

0 . 87687 

0.88087 

0.88087 

0.83087 

0.37637 

0.87587 

0.87587 

0.87687 

0.87687 

0.87687 

0.87887 

0.87887 

0.87787 

0.87287 

0.87287 

0.87287 


0.8130 
0.8090 
0.8170 
0.8130 
0.8090 
0.8150 
0.8130 
0.8090 
0.8170 
0.8130 
0.8090 
0.8140 
0.8130 
0.8090 
0.8150 
0.8130 
0.8090 
0.8135 
0.8130 
0.8090 
0.8150 
0.8130 
0.8110 
0. 8150 
0.8130 
0 . 8090 
0.8170 


TABLE  II 


Coefficients  Estimated  From 
The  Regression  of  SE  on  x 


Coef  ficient- ■; 


*0 

(N-m) 

h 

(N) 

b2 

<N) 

b3 

(N) 

cu 

(N/m) 

C22 

(N/m) 

°33 

(N/m) 

C12 

(N/m) 

C13 

(N/m) 

C23 

(N/m ) 

Estimate 

3821.457 

-19810.583 

23463.203 

-13062.571 

42582.511 

40061.787 

7169.969 

-85552.756 

25113.789 

-23579.615 
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QUALITY  IMPROVEMENT:  AN  EXPANDING 
DOMAIN  FOR  SCIENTIFIC  METHOD 

George  E.  P.  Box 

Data  Analysis  and  Experimental  Design 

Sir  Ronald  Fisher  saw  Statistics  as  the  handmaiden  of  Scientific  Investigation.  When 
he  first  went  to  Rothamsted  in  1919  it  was  to  analyze  data  on  wheat  yields  and  rainfall  that 
extended  back  over  a  period  of  some  seventy  years.  Over  the  next  few  years  Fisher 
developed  great  skill  in  such  analysis,  and  his  remarkable  practical  sense  led  in  turn  to 
important  new  theoretical  results.  In  particular,  least  squares  and  regression  analysis  were 
placed  on  a  much  firmer  theoretical  basis,  the  importance  of  graphical  methods  was 
emphasized  and  aspects  of  the  analysis  of  residuals  were  discussed. 

But  these  studies  presented  him  with  a  dilemma.  On  the  one  hand  such 
"happenstance"  data  were  affected  by  innumerable  disturbing  factors  not  under  his  control. 
On  the  other  hand,  if  agricultural  experiments  were  run  in  the  laboratory,  where  complete 
control  was  possible,  the  conclusions  might  be  valueless,  since  it  would  be  impossible  to 
know  to  what  extent  such  results  applied  to  crops  grown  in  a  farmer’s  field.  The  dilemma 
was  resolved  by  Fisher's  invention  of  statistical  experimental  design  which  did  much  to 
move  science  out  of  the  laboratory  and  into  the  real  world  -  a  major  step  in  human 
progress. 

The  theory  of  experimental  design  that  he  developed,  employing  the  tools  of 
blocking,  randomization,  replication,  factorial  experimentation  and  confounding,  solved  the 
problem  of  how  to  conduct  valid  experiments  in  a  world  which  is  naturally  nonstationary 

The  Center  for  Quality  and  Productivity  cares  about  your  reactions  to  our  reports.  Please 
send  comments  (general  or  specific)  to:  Report  Feedback,  Center  for  Quality  and 
Productivity  Improvement,  610  Walnut  Street,  Madison,  WT  53705.  All  replies  will  be 
forwarded  to  the  authors. 


and  nonhomogeneous  -  a  world,  moreover,  in  which  unseen  "lurking  variables"  are  linked 
in  unknown  ways  with  the  variables  under  study,  thus  inviting  misinformation  and 
confusion. 

Thus  by  the  beginning  of  the  1930’s  Fisher  had  initiated  what  is  now  called  statistical 
data  analysis,  had  developed  statistical  experimental  design  and  had  pointed  out  their 
complementary  and  synergistic  nature.  Once  the  value  of  these  ideas  was  demonstrated  for 
agriculture  they  quickly  permeated  such  subjects  as  biology,  medicine,  forestry  and  social 
science. 

The  1930's  were  years  of  economic  depression  and  it  was  not  long  before  the 
potential  value  of  statistical  methods  to  revive  and  reenergize  industry  came  to  be  realized. 
In  particular  at  the  urging  of  Egon  Pearson  and  others,  a  new  section  of  the  Royal 
Statistical  Society  was  inaugurated.  During  the  next  few  years,  at  meetings  of  the 
Industrial  and  Agricultural  Section  of  the  Society,  workers  firom  academia  and  industry  met 
to  present  and  discuss  applications  to  cotton  spinning,  woollen  manufacture,  glass  making, 
electric  light  manufacture,  and  so  forth.  History  has  shown  that  these  pioneers  were  right 
in  their  belief  that  statistical  method  provided  the  key  to  industrial  progress.  Unhappily 
their  voices  were  not  heard,  a  world  war  intervened,  and  it  was  at  another  time  and  in 
another  country  that  their  beliefs  were  proved  true. 

Fisher  was  an  interested  participant  and  frequent  discussant  at  these  industrial 
meetings.  He  wrote  cordially  to  Shewhart,  the  originator  at  Bell  Labs  of  quality  control. 

He  also  took  note  of  the  role  of  sampling  inspection  in  rejecting  bad  products,  but  he  was 
careful  to  point  out  that  the  rules  then  in  vogue  for  selecting  inspection  schemes  by  setting 
producer's  and  consumer's  risks,  could  not,  in  his  opinion,  be  made  the  basis  for  a  theory 
of  scientific  inference  (Fisher  1955).  He  made  this  point  in  a  critical  discussion  of  the 
theory  of  Neyman  and  Pearson  whose  "errors  of  the  first  and  second  kind"  closely 
paralleled  producer's  and  consumer's  risks.  He  could  not  have  forseen  that  fifty  years  later 
the  world  of  quality  control  would,  in  the  hands  of  the  Japanese,  have  become  the  world  of 
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quality  improvement  in  which  his  ideas  for  scientific  advance  using  statistical  techniques  of 
design  as  well  as  analysis  were  employed  by  industry  on  the  widest  possible  basis. 

The  revolutionary  shift  from  quality  control  to  quality  improvement  which  had  been 
initiated  in  Japan  by  his  long-time  friend  Dr.  W.  Edwards  Deming  was  accompanied  by 
two  important  concomitant  changes;  involvement  of  the  whole  workforce  in  quality 
improvement  and  recognition  that  quality  improvement  must  be  a  continuous  and  never- 
ending  occupation  (see  Deming  (1986)). 


Aspects  of  Scientific  Method 

To  understand  better  the  logic  of  these  changes  it  is  necessary  to  consider  certain 
aspects  of  scientific  method.  Among  living  tilings  mankind  has  the  almost  unique  ability 
of  making  discoveries  and  putting  them  to  use.  But  until  comparatively  recently  such 
technical  advance  was  slow  -  the  ships  of  the  thirteenth  century  were  perhaps  somewhat 
better  designed  than  those  of  the  twelfth  century  but  the  differences  were  not  very 
dramatic.  And  then  three  or  four  hundred  years  ago  a  process  of  quickened  technical 
change  began  which  has  ever  since  been  accelerating.  This  acceleration  is  attributed  to  an 
improved  process  for  finding  things  out  which  we  call  scientific  method. 

We  can,  I  think,  explain  at  least  some  aspects  of  this  scientific  revolution  by 
considering  a  particular  instance  of  discovery.  We  are  told  that  in  the  late  seventeenth 
century  it  was  a  monk  from  the  Abbey  of  Hautvillers  who  first  observed  that  a  second 
fermentation  in  wine  could  be  induced  which  produced  a  new  and  different  sparkling 
liquid,  delightful  to  the  taste,  which  we  now  call  champagne.  Now  the  culture  of  wine 
itself  is  known  from  the  earliest  records  of  man  and  the  conditions  necessary  to  induce  the 
production  of  champagne  must  have  occurred  countless  times  throughout  antiquity. 
However,  it  was  not  until  this  comparatively  recent  date  that  the  actual  discovery  was 
made.  This  is  less  surprising  if  we  consider  that  to  induce  an  advance  of  this  kind  two 


w 


nnnnww 


cjBnarwowwTWJWTw^Twisw 


circumstances  must  coincide.  First  an  informative  event  must  occur  and  second  a 
perceptive  observer  must  be  present  to  see  it  and  learn  from  it 

Now  most  events  that  occur  in  our  daily  routine  correspond  more  or  less  with  what 
we  expect  Only  occasionally  does  something  occur  which  is  potentially  informative. 
Also  many  observers,  whether  through  lack  of  essential  background  knowledge  or  from 
lack  of  curiosity  or  motivation,  do  not  fill  the  role  of  a  perceptive  observer. 

Thus  the  slowness  in  antiquity  of  the  process  of  discovery  can  be  explained  by  the 
extreme  rarity  of  the  chance  coincidence  of  two  circumstances  each  of  which  is  itself  rare. 
It  is  then  easily  seen  that  discovery  may  be  accelerated  by  two  processes  which  I  will  call 
informed  observation  and  directed  experimentation. 

By  a  process  of  informed  observation  we  arrange  things  so  that,  when  a  rare 
potentially  informative  event  does  occur  people  with  necessary  technical  background  and 
motivation  are  there  to  observe  it  Thus,  when  last  year  an  explosion  of  a  supernova 
occurred,  the  scientific  organization  of  this  planet  was  such  that  astronomers  observed  it 
and  learned  from  it  A  quality  control  chan  fills  a  similar  role.  When  such  a  chart  is 
properly  maintained  and  displayed  it  ensures  that  any  abnormality  in  the  routine  operation 
of  a  process  is  likely  to  be  observed  and  associated  with  what  Shewhait  called  an 
assignable  cause  -  so  leading  to  the  gradual  elimination  of  disturbing  factors. 

A  second  way  in  which  the  rate  of  acquisition  of  knowledge  may  be  increased  is  by 
what  I  will  call  directed  experimentation.  This  is  an  attempt  to  artifically  induce  the 
occurrence  of  an  informative  event.  Thus,  Benjamin  Franklin's  plan  to  determine  the 
possible  connection  of  lightning  and  electricity  by  flying  a  kite  in  a  thunder  cloud  and 
testing  the  emanations  flowing  down  the  string,  was  an  invitation  for  such  an  informative 
event  to  occur. 

Recognition  of  the  enormous  power  of  their  methods  of  scientific  advance  is  now 
commonplace.  The  challenge  of  the  modem  movement  of  quality  improvement  is  nothing 
less  than  to  use  them,  to  further  in  the  widest  possible  manner  the  effectiveness  of  human 
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activity.  By  this  I  mean  not  only  the  process  of  industrial  manufacture  but  the  running,  for 
example,  of  hospitals,  airlines,  universities,  bus  services  and  supermarkets. 

The  enormous  potential  of  such  an  approach  had  long  been  forseen  by  systems  engineers 
(see  for  example,  Jenkins  &  Youle  (1971))  but  until  the  current  demonstration  by  Japan  of 
its  practicability,  it  had  been  largely  ignored. 

^g^izationof^^gtap^ementwASmpk^^k 
The  less  sophisticated  problems  in  quality  improvement  can  often  be  solved  by 
informed  observation  using  some  very  simple  tools  that  are  easily  taught  to  the  workforce. 

While  on  the  one  hand,  Murphy’s  law  implacably  ensures  that  anything  that  can  go 
wrong  with  a  process  will  eventually  go  wrong,  this  same  law  also  ensures  that  every 
process  produces  data  which  can  be  used  for  its  own  improvement  In  this  sentence  the 
word  process  could  wean  an  Industrial  manufacturing  process,  or  a  process  for  ordering 
supplies  or  for  paying  bills.  It  could  also  mean  the  process  of  admission  to  a  hospital  or 
of  registering  at  a  hotel  or  of  booking  an  airline  flight 

One  major  difficulty  in  past  methods  of  system  design  was  the  lack  of  involvement  of 
the  people  closest  to  it  For  instance,  a  friend  of  mine  recently  told  me  of  the  following 
three  incidents  that  happened  on  one  particular  day.  In  the  morning  he  saw  his  doctor  at 
the  hospital  to  discuss  the  results  of  some  tests  that  had  been  made  two  weeks  before.  The 
results  of  the  tests  should  have  been  entered  in  his  records  but,  as  frequently  happened  at 
this  particular  hospital,  they  were  not.  The  doctor  smiled  and  said  rather  triumphantly, 
"Don’t  worry,  I  thought  they  wouldn’t  be  in  there.  I  keep  a  duplicate  record  myself 
although  I'm  not  supposed  to.  So  I  can  tell  you  what  the  results  of  your  tests  are."  Later 
that  day  my  friend  flew  from  Chicago  to  New  York  and  as  the  plane  was  taxiing  prior  to 
takeoff  there  was  a  loud  scraping  noise  at  the  rear  of  the  plane.  Some  passengers  looked 
concerned  but  said  nothing.  My  friend  pressed  the  call  button  and  asked  the  stewardess 
about  it.  She  said,  "This  plane  always  makes  that  noise  but  obviously  I  can't  do  anything 
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about  it"  Finally  on  reaching  his  hotel  in  the  evening  he  found  that  his  room,  the 
reservation  for  which  had  been  guaranteed,  had  been  given  to  someone  else.  He  was  told 
that  he  would  be  driven  to  another  hotel  and  that  because  of  the  inconvenience  his  room- 
rate  would  be  reduced.  In  answer  to  his  protest  the  reservation  clerk  said,  Tm  very  sorry 
-  it's  the  system.  It's  nothing  to  do  with  me." 

In  each  of  these  examples  the  system  was  itself  providing  data  which  could  have  been 
used  to  improve  it.  But  in  every  case  improvement  was  frustrated,  because  the  doctor,  the 
stewardess  and  die  hotel  clerk  each  believed  that  there  was  nothing  they  could  do  to  alter  a 
process  that  was  clearly  faulty.  Yet  each  of  the  people  involved  was  much  closer  to  the 
system  than  those  who  had  designed  it  and  who  were  insulated  from  receiving  data  on 
how  it  could  be  improved. 

Improvement  could  have  resulted  if,  in  each  case,  a  routine  had  been  in  place 
whereby  data  coming  from  the  system  were  automatically  used  to  correct  it. 

To  achieve  this  it  would  first  have  been  necessary 

(a)  to  instill  the  idea  that  quality  improvement  was  each  individual  persons 
responsibility, 

(b)  to  move  responsibility  for  the  improvement  of  the  system  to  a  quality 
improvement  team  which  included  the  persons  actually  involved, 

(c)  to  organize  collection  of  appropriate  data  (not  as  a  means  of  apportioning  blame, 
but  to  provide  material  for  team  problem-solving  meetings). 

The  quality  team  of  the  hospital  records  system  might  include  the  doctor,  the  nurse, 
someone  from  the  hospital  laboratory  and  someone  from  the  records  office.  For  the 
airplane  problem,  the  team  might  include  the  stewardess,  the  captain,  and  the  person 
responsible  for  the  mechanical  maintenance  of  the  plane.  For  the  hotel  problem,  the  team 
might  include  the  hotel  clerk,  the  reservations  clerk,  and  someone  responsible  for 
computer  systems.  It  is  the  responsibility  of  such  teams  to  conduct  a  relentless  and  never- 
ending  war  against  Murphy's  regime.  Because  their  studies  often  reveal  the  need  for 
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additional  expertise,  and  because  it  will  not  always  be  within  the  power  of  the  team  to 
institute  appropriate  corrective  action,  it  is  necessary  that  adequate  channels  for 
communication  exist  from  the  bottom  to  the  top  of  the  organization,  as  well  as  from  the  top 
to  the  bottom. 

Three  potent  weapons  in  the  campaign  for  quality  improvement  are  Corrective 
Feedback,  Preemptive  Feedforward  and  Simplification.  The  first  two  are  self  explanatory. 
The  importance  of  simplification  has  been  emphasized  and  well  illustrated  by  Tim  Fuller 
(1986).  In  the  past  systems  have  tended  to  become  steadily  more  complicated  without 
necessarily  adding  any  corresponding  increase  in  effectiveness.  This  occurs  when 

(a)  the  system  develops  by  reaction  to  occasional  disasters, 

(b)  action,  supposedly  corrective,  is  instituted  by  persons  remote  from  the  system, 

(c)  no  check  is  made  on  whether  corrective  action  is  effective  or  not. 

Thus  the  institution  by  a  department  store  of  complicated  safeguards  in  its  system  for 
customer  return  of  unsatisfactory  goods  might  be  counter-productive;  while  not  providing 
sufficient  deterence  to  a  mendacious  few,  it  could  cause  frustration  and  rejection  by  a  large 
number  of  honest  customers.  By  contrast  the  success  of  a  company  such  as  Marks  & 
Spencer  who  believe  instead  in  simplification  and,  in  particular,  adopt  a  very  enlightened 
policy  toward  returned  goods,  speaks  for  itself. 

Because  complication  provides  work  and  power  for  bureaucrats,  simplification  must 
be  in  the  hands  of  people  who  can  benefit  from  it.  The  time  and  money  saved  from 
quality  improvement  programs  of  this  sort  far  more  than  compensates  for  that  spent  in 
putting  them  into  effect.  No  less  important  is  the  great  boost  to  the  morale  of  the 
workforce  that  comes  from  their  knowing  that  they  can  use  their  creativity  to  improve 
efficiency  and  reduce  frustration. 

Essential  to  the  institution  of  quality  improvement  is  the  redefinition  of  the  role  of  the 
nianager.  He  should  not  be  an  officer  who  conceives,  gives  and  enforces  orders  but  rather 
a  coach  who  encourages  and  facilitates  the  work  cf  his  quality  teams. 


At  a  slightly  more  sophisticated  level  the  process  of  informed  observation  may  be 
facilitated  by  a  suitable  set  of  statistical  aids  typified,  for  example,  by  Ishikawa’s  seven 
tools.  They  are  described  in  an  invaluable  book  available  in  English  (Ishikawa  1976)  and 
written  for  foremen  and  workers  to  study  together.  The  tools  are  check  sheets,  Pareto 
charts,  cause-effect  diagrams,  histograms,  graphs,  stratification  and  scatter  plots.  They 
can  be  used  for  the  study  of  service  systems  as  well  as  manufacturing  systems  but  I  will 
use  an  example  of  the  latter  land  (see,  for  example.  Box  and  Bisgaard,  (1987)). 

Suppose  a  manufacturer  of  springs  finds  that  at  the  end  of  a  week  that  75  springs 
have  been  rejected  as  defective.  These  rejects  should  not  be  thrown  away  but  studied  by  a 
quality  team  of  the  people  who  make  them.  As  necessary  this  team  would  be  augmented 
from  time  to  time  with  appropriate  specialists.  A  tally  on  a  check  sheet  could  categorize 
the  rejected  springs  by  the  nature  of  the  defect  Display  of  these  results  on  a  Pareto  chart 
might  then  reveal  the  primary  defect  to  be,  say,  cracks.  To  facilitate  discussion  of  what 
might  cause  the  cracks  the  members  of  the  quality  team  would  gather  around  a  blackboard 
and  clarify  their  ideas  using  a  cause-effect  diagram.  A  histogram  categorizing  cracks  by 
size  would  provide  a  clear  picture  of  the  magnitue  of  the  cracks  of  how  much  they  varied. 
This  histogram  might  then  be  stratified,  for  example,  by  spring  type.  A  distributional 
difference  would  raise  the  question  as  to  why  the  cracking  process  affected  the  two  kinds 
of  springs  differently  and  might  supply  important  clues  as  to  the  cause.  A  scatter  plot 
could  expose  a  possible  correlation  of  crack  size  with  holding  temperature  and  so  forth. 
With  simple  tools  of  this  kind  the  team  can  work  as  quality  detectives  gradually  "finding 
and  fixing"  things  that  are  wrong. 

It  is  sometimes  asked  if  such  methods  work  outside  Japan.  One  of  many  instances 
showing  that  it  can,  is  supplied  by  a  well-known  Japanese  manufacturer  making  television 
sets  just  outside  Chicago  in  the  United  States.  The  plant  was  originally  operated  by  an 
American  company  using  traditional  methods  of  manufacture.  When  the  Japanese 


company  first  took  over  the  reject  rate  was  146%.  This  meant  that  most  of  the  television 
sets  had  to  be  taken  off  the  line  once  to  be  individually  repaired  and  some  had  to  be  taken 
off  twice.  By  using  simple  "find  and  fix"  tools  like  those  above  the  reject  rate  over  a 
period  of  4  *  5  years  was  reduced  from  146%  to  2%.  Although  this  was  a  Japanese 
company  only  Americans  were  employed,  and  a  visitor  could  readily  ascertain  that  they 
greatly  preferred  the  new  system. 

Evolutionary  Operation 

Evolutionary  Operation  (EVOP)  is  an  example  of  how  elementary  ideas  of 
experimental  design  can  be  used  by  the  whole  workforce.  The  central  theme  (Box  1957, 
Box  and  Draper  1969)  is  that  an  operating  system  can  be  organized  which  mimics  that  by 
which  biological  systems  evolve  to  optimal  forms. 

For  manufacturing,  let  us  say,  a  chemical  intermediate,  the  standard  procedure  is  to 
continually  run  the  process  at  fixed  levels  of  the  process  conditions — temperature,  flow 
rate,  pressure,  and  agitation  speed  and  so  forth.  Such  a  procedure  may  be  called  static 
operation.  However,  experience  shows  that  the  best  conditions  for  the  full  scale  process 
are  almost  always  somewhat  different  from  those  developed  from  smaller  scale 
experimentation  and  furthermore,  that  some  factors  important  on  the  full  scale  cannot 
always  be  adequately  simulated  in  smaller  scale  production.  The  philosophy  of 
Evolutionary  Operation  is  that  the  full  scale  process  may  be  run  to  produce  not  only 
product,  but  also  information  on  how  to  improve  the  process  and  the  product.  Suppose 
that  temperature  and  flow  rate  are  the  factors  chosen  for  initial  study.  In  the  evolutionary 
operation  mode  small  deliberate  changes  could  be  made  in  these  two  factors  in  a  pattern 
(an  experimental  design)  about  the  current  best-known  conditions.  By  continuous 
averaging  and  comparison  of  results  at  the  slightly  different  conditions  as  they  come  in, 
information  gradually  accumulates  which  can  point  to  a  direction  of  improvement  where 
for  example  higher  conversion  or  less  impurity  can  be  obtained. 
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Important  aspects  of  Evolutionary  Operation  are 

(a)  it  is  an  alternative  method  of  continuous  process  operation.  It  may  therefore  be 
run  indefinitely  as  new  ideas  evolve  and  the  importance  of  new  factors  are 
realized. 

(b)  it  is  run  by  plant  operators  as  a  standard  routine  with  the  guidance  of  the 
process  superintendent  and  occasional  advice  from  an  evolutionary  operation 
committee.  It  is  thus  very  sparing  in  the  use  of  technical  manpower. 

(c)  it  was  designed  for  improving  yields  and  reducing  costs  in  the  chemical  and 
process  industries.  In  the  parts  industries,  where  the  problem  is  often  that  of 
reducing  variation  by  studying  variances  instead  of  means  at  the  various  process 
conditions  the  process  can  be  made  to  evolve  to  one  where  variation  is 
minimized. 


Design  of  Experiments  for  Engineers 

The  methods  described  so  far  are  ways  of  doing  the  best  we  can  with  what  we  have, 
assuming  that  the  basic  design  of  the  product  we  produce  and  the  process  that  produces  it 
are  essentially  immutable.  Obviously  a  product  or  process,  which  suffers  from  major 
deficiences  of  design,  cannot  be  improved  beyond  a  certain  point  by  these  methods. 

However  by  artful  design  of  a  new  product  and  of  the  process  that  makes  it,  it  may  be 
possible  to  arrive  both  at  a  highly  effluent  process  of  manufacture  and  a  product  that 
behaves  well  and  almost  never  goes  wrong.  The  design  of  new  products  and  processes  is 
a  fertile  field  for  the  employment  of  statistical  experimental  design  by  engineers. 

Which?  How?  Why  ? 

Suppose  y  is  some  quality  characteristic  whose  probability  distribution  depends  on 
the  level  of  a  number  of  factors  x.  Experimental  design  may  be  used  to  reveal  certain 
aspects  of  this  dependence;  in  particular  how  the  mean  E(y)  =  fix),  and  the  variance 
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a2(y)  =  F(x),  depend  on  x.  Both  choice  of  design  and  method  of  analysis  are  greatly 
affected  by  what  we  know  or  what  we  think  we  know  about  the  input  variables  x  and  the 
functions  f(x)  and  F(x)  (see  for  example  Box,  Hunter  and  Hunter  (1978)). 


Which ’  In  the  early  stages  of  investigation  the  task  may  be  to  determine  which 
subset  of  variables  x ^  chosen  from  the  larger  set  (x)  are  of  importance  in  affecting  y. 


In  this  connection  a  Pareto  hypothesis  (a  hypothesis  of  "effect  sparsity")  becomes 


appropriate  and  the  projective  properties  into  lower  dimensions  in  the  factor  space  of 


highly  fractionated  designs  may  be  used  to  find  an  active  subset  of  k  or  fewer  active 


factors.  Analyses  based  on  normal  plots  and/or  Bayesian  methods  (Box  and  Meyer 


1986a))  are  efficient  and  geometrically  appealing. 

How :  When  we  know  or  think  we  know  which  are  the  important  variables  x^  we 


may  need  to  determine  more  precisely  how  changes  in  their  levels  affect  y.  Often  the 


nature  of  the  functions  f(x)  and  F(x)  will  be  unknown.  However  over  some  limited 


region  of  interest  a  local  Taylor's  series  approximation  of  first  or  second  order  in  x^ 
may  provide  an  adequate  approximation,  particularly  if  y  and  x^  may  be  re-expressed 


when  necessary  in  appropriate  alternative  metrics.  Fractional  factorials  and  other  response 


surface  designs  of  first  and  second  order  are  appropriate  here.  Maxima  may  be  found  and 


exploited  using  steepest  ascent  methods  followed  by  canonical  analysis  of  a  fitted  second 


degree  equation  in  appropriately  transformed  metrics.  The  possibilities  for  exploiting 
multidimensional  ridges  and  hence  alternative  optimal  process  become  particularly 


important  at  this  stage  (see  for  example  Box  and  Draper  1986). 


Why:  Instances  occur  when  a  mechanistic  model  can  be  postulated.  This  might  take 
the  form  of  a  set  of  differential  equations  believed  to  describe  the  underlying  physics. 
Various  kinds  of  problems  arise.  Among  these  are: 


How  should  the  parameters  (often  corresponding  to  unknown  physical  constants)  be 


estimated  from  the  data? 


How  should  candidate  models  be  tested? 
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How  should  wc  select  a  model  from  competing  candidates? 

What  kinds  of  experimental  designs  are  appropriate? 

Workers  in  quality  improvement  have  so  far  been  chiefly  occupied  with  problems  of 
the  "which"  and  occasionally  of  the  "how"  kind  and  have  consequently  made  most  use  of 
fractional  factorial  designs,  other  orthogonal  arrays,  and  response  surface  designs. 
Studvine  Location.  Dispersion  and  Robustness 

In  the  past  experimental  design  had  been  used  most  often  as  a  means  of  discovering 
how  xjj.  affected  the  mean  value  E(y)  how,  for  example,  the  process  could  be  improved 

by  increasing  the  mean  of  some  quality  characteristics.  Modem  quality  improvement  also 
stresses  the  use  of  experimental  design  in  reducing  dispersion  as  measured,  for  example, 
by  the  variance. 

Using  experimental  designs  to  minimize  variation:  High  quality  particularly  in  the  parts 

industries  (e.g  automobiles,  electronics)  is  frequently  associated  with  minimizing 

dispersion.  In  particular  the  simultaneous  Study  of  the  effect  of  the  variables  x  on  the 

mean  and  on  the  variance  is  important  in  the  problem  of  bringing  a  process  on  target  with 

smallest  possible  dispersion  (Phadke  1982).  Bartlett  and  Kendall  (1946)  pointed  to  the 
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advantages  of  analysis  in  terms  of  log  s  y  to  produce  constant  variance  and  increased 

additivity  in  the  dispersion  measure.  It  is  also  very  important  in  such  studies  to  remove 
transformable  dependence  between  the  mean  and  standard  deviation.  Taguchi 
(1986,1987)  attempts  to  do  this  by  the  use  of  a  signal  to  noise  ratios.  However,  it  may  be 
shown  that  it  is  much  less  restrictive,  simpler  and  more  statistically  efficient  to  proceed  by 
direct  data  transformation  obtained,  for  example,  by  a  "lambda  plot"  (Box  1988). 

A  practical  difficulty  may  be  the  very  large  number  of  experimental  runs  which  may 
be  needed  in  such  studies  if  complicated  designs  are  employed.  It  is  recently  shown  how 
Cuing  what  Fisher  called  hidden  replication,  unreplicated  fractions  may  sometimes  be 
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employed  to  identify  sparse  dispersion  effects  in  the  presence  of  sparse  location  effects 
(Box  and  Meyer  19866)). 

Experimental  Design  and  Robustness  to  the  Environment:  A  well  designed  car  will  start 
over  a  wide  range  of  conditions  of  ambient  temperature  and  humidity.  The  design  of  the 
starting  mechanism  may  be  said  to  be  "robust"  to  changes  in  these  environmental 
variables.  Suppose  E(y)  and  possibly  also  o^(y)  are  functions  of  certain  design 
variables  x^  which  determine  the  design  of  the  system  and  also  of  some  external 
environmental  variables  xv  which,  except  in  the  experimental  environment,  are  not  under 

our  control.  The  problem  of  robust  design  is  to  choose  a  desirable  combination  of  design 
variables  x^  at  which  good  performance  is  experienced  over  a  wide  range  of 

environmental  conditions. 

Related  problems  were  earlier  considered  by  Youden  (1961a,6)  and  Wemimont 
(1975)  but  recently  their  great  importance  in  quality  improvement  has  been  pointed  out  by 
Taguchi.  His  solution  employs  an  experimental  design  which  combines  multiplicatively 
an  "inner"  design  array  and  an  "outer"  environmental  array.  Each  piece  of  this 
combination  is  usually  a  fractional  factorial  design  or  some  other  orthogonal  array.  Recent 
research  has  concentrated  on  various  means  for  reducing  the  burdensome  experimental 
effort  which  presently  may  be  needed  for  studies  of  this  kind. 

Robustness  of  an  assembly  to  variation  in  its  components:  In  the  design  of  an  assembly, 
such  as  an  electrical  circuit,  the  exact  mathematical  relation  y  =  f(x)  between  the  quality 
characteristic  of  the  assembly,  such  as  the  output  voltage  y  of  the  circuit,  and  the 
characteristics  x  of  its  components  (resistors,  capacitors,  etc.)  may  be  known  from 
physics.  However  there  may  be  an  infinite  variety  of  configurations  of  x  that  can  give 
the  same  desired  mean  level  E(y)  =q,  say.  Thus  an  opportunity  exists  for  optimal 
design  by  choosing  a  "best"  configuration. 

Suppose  the  characteristics  x  of  the  components  vary  about  "nominal  values"  E 
with  known  covariance  matrix  V.  Thus  fc  example  a  particular  resistance  Xj  might  vary 


about  its  nominal  value  4j  with  known  variance  o^  .  (Also  variation  in  one  component 


would  usually  be  independent  of  that  of  another  so  that  V  would  usually  be  diagonal.) 

Now  variation  in  the  input  characteristics  x  will  transmit  variation  to  the  quality 
characteristic  y  so  that  for  each  choice  of  component  nominal  values  4  which  yield  the 
desired  output  y  =  T|  there  will  be  an  associated  mean  square  error  E(y  -  tj)^  =  M(n)  = 
F(4)- 

Using  a  Wheatstone  Bridge  circuit  for  illustration,  Taguchi  and  Wu  (1985)  pose  the 
problem  of  choosing  4  so  that  M(rj)  is  minimized.  To  solve  it  they  again  employ  an 
experimental  strategy  using  inner  and  outer  arrays.  Box  and  Fung  (1986)  have  pointed 
out,  however,  that  their  procedure  does  not  in  general  lead  to  an  optimal  solution  and  that 
it  is  better  to  use  a  simpler  and  more  general  method  employing  a  standard  numerical 
nonlinear  optimization  routine.  The  latter  authors  also  make  the  following  further  points. 

(a)  For  an  electrical  circuit  it  is  reasonable  to  assume  that  the  relation  y  =  f(x)  is 
known,  but  when,  as  is  usually  the  case,  y  =  f(x)  must  be  estimated 
experimentally,  the  problems  are  much  more  complicated  and  require  further 
study. 
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(b)  It  is  also  supposed  that  each  of  the  c-  are  known  and  furthermore  that  they 

remain  fixed  or  change  in  a  known  way  (for  example  proportionally)  when  4i 
changes.  The  nature  of  the  optimal  solution  can  be  vastly  different  depending 
on  the  validity  of  such  assumptions. 

Taguchi’s  quality  engineering  ideas  are  clearly  important  and  present  a  great 
opportunity  for  development.  It  appears  however  (see,  for  example,  Box,  Bisgaard  and 
Fung  (1988))  the  accompanying  statistical  methods  that  Taguchi  recommends  employing 
"accumulation  analysis,"  "signal  to  noise  ratios"  and  "minute  analysis"  are  often  defective, 
inefficient  and  unnecessarily  complicated.  Furthermore,  Taguchi’s  philosophy  seems  at 
times  to  imply  a  substitution  of  statistics  for  engineering  rather  than  the  use  of  statistics  as 
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a  catalyst  to  engineering  (Box  1988).  Because  such  deficiencies  can  be  easily  corrected  it 
is  particularly  unfortunate  that,  in  the  United  States  at  least,  engineers  are  often  taught 
these  ideas  by  instructors  who  stress  that  no  deviation  from  Taguchi's  exact  recipe  is 
permissible. 

A  Wider  Domain  for  Scientific  Method 

Quality  improvement  is  about  finding  out  how  to  do  things  better.  The  efficient  way 
to  do  this  is  by  using  scientific  method — a  very  powerful  tool,  employed  in  the  past  by 
only  a  small  elite  of  trained  scientists.  Modem  quality  improvement  extends  the  domain  of 
scientific  method  over  a  number  of  dimensions: 

over  users  (e.g.  from  the  chief  executive  officer  to  the  janitor) 

over  areas  of  human  endeavor  (e.g.  factories,  hospitals,  airlines,  department  stores) 

over  time  (never-ending  quality  improvement) 

over  causative  factors  (an  evolving  panorama  of  factors  that  effect  the  operation  of  a  system). 
Users:  Although  it  is  not  possible  to  be  numerically  precise  I  find  a  rough  graphical 
picture  helpful  to  understanding.  The  distribution  of  technological  skill  in  the  workforce 
might  look  something  like  Figure  1(a).  The  distribution  of  technological  skill  required  to 
solve  the  problems,  that  routinely  reduce  the  efficiency  of  factories,  hospitals,  bus 
companies  and  so  forth,  might  look  something  like  Figure  1(b). 

In  the  past  only  those  possessing  highly  trained  scientific  or  managerial  talent,  would 
have  been  regarded  as  problem  solvers.  Inevitably  this  small  group  could  only  tackle  a 
small  proportion  of  the  problems  that  beset  the  organization.  One  aspect  of  the  new 
approach  is  that  many  problems  can  be  solved  by  suitably  trained  persons  of  lesser 
technical  skills.  An  organization  that  does  not  use  this  talent  throws  away  a  large 
proportion  cf  its  creative  potential.  A  second  aspect  is  that  engineers  and  technologists 
must  be  taught  how  to  experiment  simultaneously  with  many  variables  in  the  presence  of 
noise.  Without  knowledge  of  statistical  experimental  design  they  are  not  equipped  to  do 
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Figure  1.  la)  Hypothetical  frequency  distribution  of  workers  possessing 
given  technical  skill 

lb)  Hypothetical  frequency  distribution  of  problems  requiring 
various  design  of  technological  skill  for  their  solution 


this.  A  group  visiting  Japanese  industry  was  recently  told  "an  engineer  who  does  not 
know  statistical  experimental  design  is  not  an  engineer."  (Box  et  al  1988.) 

Areas  of  Endeavor:  At  first  sight  we  tend  to  think  of  quality  improvement  as  applying 
only  to  operations  on  the  factory  floor.  But  even  in  manufacturing  organizations  a  high 
proportion  of  the  workforce  are  otherwise  engaged  —  in  billing,  invoicing,  planning, 
scheduling  and  so  forth  -  all  of  which  should  be  made  the  subject  of  study. 

But  the  individual  citizen  in  every  day  of  his  life  must  deal  with  an  unnecessarily 
complex  world,  involving  hospitals,  government  departments,  universities,  airlines  and  so 
forth.  Lack  of  quality  in  these  organizations  results  in  needless  expense,  wasted  time  and 
unnecessary  frustration.  Quality  improvement  applied  to  these  activities  could  free  us  all 
for  more  productive  and  pleasurable  pursuits. 

Time:  For  never  ending  improvement  there  must  be  a  long-term  commitment  to  renewal. 
A  commonly  used  statistical  model  links  a  set  of  variables  xk  with  a  response  y  by  an 

equation  y  =  f(xk)  +  e  where  e  is  an  error  term,  often  imbued  by  statisticians  with 

properties  of  randomness,  independence  and  normality.  A  more  realistic  version  of  this 
model  is 

y  =  f(xk)  +  e(xu) 

where  xu  is  a  set  of  variables  whose  nature  and  behavior  is  unknown.  As  time  elapses, 

by  skillful  use  of  the  techniques  of  informed  observation  and  experimental  design,  as  time 
elapses,  elements  of  xu  are  transferred  into  xk  —  from  the  unknown  into  the  known. 

This  transference  is  the  essence  of  modem  quality  improvement  and  has  two  consequences: 

(a)  once  a  previously  unknown  variable  has  been  identified  it  can  be  fixed  at  a  level 
that  produces  best  results. 

(b)  by  fixing  it  we  remove  an  element  previously  contributing  to  variation. 

This  transfer  can  be  never-ending  process  whereby  knowledge  increases  and  variation  is 
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reduced. 


The  structure  of  the  process  of  investigation  so  far  as  it  involves  statistics  has  not 
always  been  understood  It  has  sometimes  been  supposed  that  it  consists  of  testing  a  null 
hypothesis  selected  in  advance  against  alternatives  using  a  single  set  of  data.  In  fact  most 
investigations  proceed  in  an  iterative  fashion  in  which  deduction  and  induction  proceed  in 
alternation  (see,  for  example.  Box  (1980)).  Clearly  optimization  of  individual  steps  in 
such  a  process  can  lead  to  sub-optimization  for  the  investigational  process  itself.  The 
inferential  process  of  estimation  whereby  a  postulated  model  and  supposedly  relevant  data 
are  combined  is  purely  deductive  and  resembles  a  process  of  "summation"of  model  and 
data.  It  is  conditional  on  the  assumption  that  the  model  of  the  data  generating  process  and 
the  actual  process  itself  are  consonant.  No  warning  that  they  are  not  consonant  is  provided 
by  estimation.  However  a  comparison  of  appropriate  qualities  derived  from  the  data  with  a 
sampling  reference  distribution  generated  by  the  model  provides  a  process  of  criticism  that 
can  not  only  discredit  the  model  but  suggest  appropriate  direction  for  model  modification. 
An  elementary  example  of  this  is  Shewhart's  "assignable  cause"  deduced  from  data 
outside  control  lines  which  are  calculated  from  a  model  of  the  data-generating  process  in  a 
state  of  control.  Such  a  process  of  criticism  contrasts  features  of  the  model  and  the  data 
and  thus  resembles  a  process  of  "differencing"  of  model  and  data.  It  can  lead  the 
engineer,  scientist  or  technologist  by  a  process  of  induction  to  postulate  a  modified,  or  a 
totally  different  model,  so  recharting  the  course  for  further  exploration.  This  process  is 
subjective  and  artistic.  It  is  the  only  step  that  can  introduce  new  ideas  and  hence  must  be 
encouraged  above  all  else.  It  is  best  encouraged,  1  believe,  by  interactive  graphical 
analysis.  This  is  readily  provided  these  days  by  computers  which  also  make  it  possible  to 
use  sophisticated  statistical  ideas  that  are  calculation-intensive  and  yet  produce  simply 
understood  graphical  output  It  is  by  following  such  a  deductive-inductive  iteration  that 
the  quality  investigator  can  be  led  to  a  solution  of  a  problem  just  as  a  good  detective  can 
solve  a  murder  mystery. 

1  8 


Factors  and  Assignable  Causes:  The  field  of  factors  potentially  important  to  quality 
improvement  also  can  undergo  seemingly  endless  expansion.  Problems  of  optimization 
are  frequently  posed  as  that  of  maximizing  some  response  y  over  a  k-dimensional  space 
of  known  factors  x^,  but  in  quality  improvement  the  factor  space  is  never  totally  known 

and  is  continually  developing. 

Consider  a  possible  scenario  for  a  problem  which  begins  as  that  of  choosing  the 
reaction  time  xj  and  reaction  temperature  X2  to  give  maximum  conversion  y  of  raw 

materials  to  the  desired  product.  Suppose  experimentation  with  these  two  factors  leads  to 
the  (conditional)  optimal  choice  of  coordinates  in  Figure  2(a).  Since  conversion  is  only 
50%  we  see  that  the  best  is  not  very  good  if  we  restrict  the  system  in  this  way.  After  some 
deliberation  it  is  now  suggested  that  higher  conversion  might  be  obtained  if  biproducts 
which  may  be  being  formed  at  the  beginning  of  the  reaction  were  suppressed  by 
employing  a  lower  temperature  in  the  early  stages.  This  idea  produces  two  new  variables, 
the  initial  temperature  X3  and  the  time  X4  taken  to  reach  the  final  temperature.  Their  best 
values,  and  the  appropriately  changed  levels  of  xj  and  X2,  might  be  those  shown  in 

Figure  2(b).  This  new  (conditional)  optimal  profile  again  results  in  only  partial  success 

(y  =  68%)  leading  to  the  suggestion  that  the  (newly  increased)  final  temperature  may  result 

in  other  biproducts  being  formed  at  the  later  stages  of  reaction.  This  suggests 
experimentation  with  variables  X3  and  x^  allowing  for  a  fall  off  in  temperature  towards 

the  end  of  the  reaction.  The  new  (conditional)  optimal  profile  at  this  stage  might  then  be  as 
in  Figure  2(c). 

These  results  might  now  be  seen  by  a  physical  chemist  leading  him  to  suggest  a 
mechanistic  theory  yielding  a  series  of  curved  profiles  which  depended  on  only  two 
theoretical  constants  Xj  and  X2.  If  this  idea  was  successful  the  introduction  of  two  new 

experimental  factors  would  have  eliminated  the  need  for  the  other  six.  The  new 
mechanistic  theory  could  in  turn  suggest  new  factors,  not  previously  thought  of,  which 
might  produce  even  greater  conversion  and  so  on.  Thus  the  factor  space  must  realistically 
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be  regarded  as  one  which  is  continually  changing  and  developing  as  part  of  the  evolution 
which  occurs  within  the  scientific  process  in  a  manner  discussed  earlier  by  Kuhn  (1962). 


Training 


Instituting  the  necessary  training  for  quality  is  a  huge  and  complex  task.  Some 
assessment  must  be  made  of  the  training  needs  for  the  workforce,  for  engineers, 
technologists  and  scientists,  and  for  managers  at  various  levels,  and  we  must  consider 
how  such  training  programs  can  be  organized  using  the  structure  that  we  have  within 
industry,  service  organizations,  technical  colleges  and  universities. 

A  maximum  multiplication  effect  will  be  achieved  by  a  scheme  in  which  the  scarce 
talent  that  is  available  is  employed  to  teach  the  teachers  within  industry  and  elsewhere. 

It  is,  I  believe,  unfortunately  true  that  the  number  of  graduates  who  are  intersted  in 
industrial  statistics  in  Great  Britain  has  been  steadily  decreasing.  The  reasons  for  this  arc 
complex  and  careful  analysis  and  discussion  between  industry,  the  government,  and  the 
statistical  fraternity  is  necessary  to  discover  what  might  be  done  to  rectify  the  situation. 

Management 

The  case  for  the  extension  of  scientific  method  to  human  activities  is  so  strong  and  so 
potentially  beneficial  that  one  may  wonder  why  this  revolution  has  not  already  come 
about.  A  major  difficulty  is  to  persuade  the  managers. 

In  the  United  States  it  is  frequently  true  that  both  higher  management  and  the 
workforce  are  in  favor  of  these  ideas.  Chief  executive  officers  whose  companies  are 
threatened  with  extinction  by  foreign  competition  are  readily  convinced  and  are  prepared  to 
exhort  their  employees  to  engage  in  quality  improvement  This  is  certainly  a  step  in  the 
right  direction  but  it  is  not  enough.  I  recently  saw  a  poster  issued  jointly  by  the  Union  and 
the  Management  of  a  large  automobile  company  in  the  United  States  which  reiterated  a 
Chinese  proverb  "Tell  me  —  I’ll  forget;  Show  me  —  I  may  remember;  Involve  me  and 
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I'll  understand".  Fm  not  sure  that  top  management  always  realize  that  their  involvement 
(not  just  their  encouragement)  is  essential.  The  workforce  enjoy  taking  part  in  quality 
improvement  and  will  cooperate  provided  they  believe  that  the  improvements  they  bring 
about  will  not  be  used  against  them. 

Some  members  of  the  middle  levels  of  management  and  of  the  bureaucracy  pose  a 
more  serious  problem  because  they  see  loss  of  power  in  sharing  the  organization  of  quality 
improvement  with  others.  Clearly  the  problems  of  which  Mr.  Gorbachev  complains  in 
applying  his  principles  of  perestroika  are  not  confined  to  the  Soviet  Union. 

Thus  the  most  important  questions  are  really  not  concerned  so  much  with  details  of 
the  techniques  but  with  whether  a  process  of  change  in  management  can  be  brought  about 
so  that  they  can  be  used  at  all.  It  is  for  this  reason  that  Dr.  Deming  and  his  followers  have 
struggled  so  hard  with  this  most  difficult  problem  of  all  —  that  of  inducing  the  changes  in 
management  and  instituting  the  necessary  training  that  can  allow  the  powerful  tools  of 
scientific  method  to  be  used  for  quality  improvement  It  is  on  the  outcome  of  this  struggle 
that  our  economic  future  ultimately  depends. 


Fjsher^sUyiyicy 

At  the  beginning  of  this  lecture  I  portrayed  Fisher  as  a  man  who  developed  statistical 
methods  of  design  and  analysis  that  extended  the  domain  of  science  from  the  laboratory  to 
the  whole  world  of  human  endeavor.  We  have  the  opportunity  now  to  bring  that  process 
to  full  fruition.  If  we  can  do  this  we  can  not  only  look  forward  to  a  rosier  economic 
future,  but  by  making  our  institutions  easier  to  deal  with,  we  can  improve  the  quality  of 
our  everyday  lives,  and  most  important  of  all  we  can  joyfully  experience  the  creativity 
which  is  a  part  of  every  one  of  us. 
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Customers*  Processing 

Example:  Bales  of  Fibers  (8")  used 

for  Carpets 
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Problems/Blessings  in  Chemical  Processing 


Necessary  (e.g.,  Carpet 
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Example:  Handling  Uncertainty  in  Engineering 

Design  and  Quality  Improvement 
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The  real  job  of  any  business  is 
creating  value" 
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VARIANCE  COMPONENT  DESIGN  FOR  PROCESS 
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•  SAMPLE-TO-SAMPLE  (TIME) 

•  POSITION-TO'POSITION 

•  FILAMENT-TO-FILAMENT 

FILAMENT-TO-FILAMENT  AUTOMATICALLY 
ESTIMATED  BY  INSTRUMENT.  ABOUT  50 
FILAMENTS  PER  SAMPLE. 
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STATISTICAL  DESIGN  RESULTS 


Sample  Sequence. 

Samples  taken  every  12  hours 
over  2  days. 
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Session  III. 

Statistics  in  Design 
for  Flexible  Automation 
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The  Problem  of  Methods  Divergence 
in  Flexible  Automation 
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Methods  Divergence  is  a  Problem 


Definition  of  Methods  Divergence 


Quality 
Assessment  B 


What  is  Needed: 
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Key  Concepts 


Design  Intent:  Interchangeable  Assembly 


Shifted 

Plates 


Quality  Assurance  Folklore,  Scene  1 


Functional  Gaging  Systems: 
Simulation  of  Worst-Case  M; 


>rs  in  Functional  Gage  Systems 


Quality  Assurance  Reality,  Scene  2 


Sample  Measurement  Systems: 


Experiment:  Add  errors  to  individual  points.  Investigate  differences 
between  fitting  algorithms. 


Sample  Measurement  Systems 


Errors  in  Sample  Measurement  Systems 


of  the  Problem 
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The  Solution 
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*  A  BayesiAm  Perspective  on  j 

*  "  Toler  anon  C» 

B  N^O.  SiN^PuRcOAtlA 

i  Preamble  : 

*  MV  Review  of  The  Literature  on 
*"  Statistical  Tolerancinc,"  Leads  me  7b 

Beleive  That  There  May  Be  A  lack 
of  uniformity  with  Reference  7b> 
The  meaning  And  interpretation 
of  The  Notion  of  Tolerance. 

The  above"  Consideration  Leads 
Me  To  Propose  The  Following  as  A 

Streamlined  Approach  To  Viewing 
Tolerance.  IS  My  approach  Wsa  Known ' 
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We  Attempt  To  model  The  Design 
Engineers  PeRSpecTi  ve. 


•  Let  £  Be  Some  Dimension  of  j-j 

interest  To  A  Designer  o0.  fi 

T 

f\ 

•  Since  The  Notion  of  A  measurement' 
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Ex.  Dr  Diameter  of  a  Bolt. 


Such  As  A  Length,  Width,  Circum- 


FRENwE  ,  ETC.,  is  An  A8STAACT/ON 

(it  Exists  onlv  in  our  minds  And  So 
can  never  Be  Realized  in  Real  Life),  jf 
D  IS  An  Unknown  ^uANTiTy_ffetini-jr 

10  *  *  «»n  «v<1  •  f\  VV I  Irl  If  * 

IN  o0 ’s  Mind. 

*  All  UNCERTAiNTy  is  Best  Descrieed 
By  PRQBABlLiTy  and  So 
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#my  Subjectively  Specify  Hi 4 /we*  » 
g  4JNCER.TAlNTy  ABOUT  ReALIZiNE,  0)  Via  A  PA*f 

■  Statement  (Assumed  Symmetric/  j 

«  I 

^1)  Br  {  £  &  f  4+€}s|-°((  | 

*  Inhere  ^  >o  is  A  Small  no. 

^  (1)  ABove  IS  Called  A  "Design  Spec." 

•  C  ITS  FREQtiENTIST  INTER PRE  TAT/oN  is 


g  That  <«**)%  of  items  PROOuced  wiu 

fl  HAVE  DELATE,  <1-0  J. 

« 

g«  d.  is  Called  The  Nominal  Value  ,  and 

fl*  €  is  caued  A  Tolerance, 

4t(-)€  is  Called  An  upper  Closer) 

*  Spec ification  limits  ”* 
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0%  OBJECTIVE  ($  To  SPECIFY  € 

Given  e<  and  dL. 

The  Manufacturing  Scenario. 

/ 

•  Let  £  Be  The  Dimension  Correspoa 
inG  To  D  OF  The  manufactured  item 

•  Once  Again,  j/  is  An  unknown  Ournti 

To  00,  WHO  DESCRIBES  His/her 

,  1 

uncertainty  about  D,  v/a  Say  a  i! 
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•  SP.  Further  That  &  i$  uncertain  ABour^l 
and  For  o (n\m>r')~JT(.'m,  rl).  if 


««utflUrfK-M  nJifVF 


I  •  IT  NOVU  FOLLOWS  ( FROM  STANMfiO  AA4UM9MT* 

®  That,  FoR  c&. 

fl 

10)  (D'Iw,  V“*Ta)  ~  \fT(frf\i  (t\tz)). 

fl 

g  Note  •  IN  Th€  LITERATURE,  DISTRIBUTIONS 

fl  Such  As  (3)  Are  Given  A  Freguentist 

•  Interpretation,  Amo  are  Viewed  AS  Being 

m 

Generated  As  The  Actuai.  Capabilities  op 

a 

g  The  Manufacturing  Process.  \a\€  Take  Except 
gj  With  This  Point  of  View  Ano  Regard  &)  AS 
fl  &s  Subjective  Assessment  About  The 
a  UNceRjAiNTy  A8out  Dl  The  Freqoemtist 

*  interpretation  of  C3)  fails  To  Hold  when 

m 

_  we  Consider  Flexible  Manufacturing  Scenarios, 
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F«om  (3)  A60V6,  IT  Follows,  That  ForA 

00.  11 
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(4)  P*{  $  £  Wt  +  Jk ( 0-VtVII 

:  1-8,  VMH6R6  For  A  Hi  Specific©  fit  " 

.  I 

JK  1$  Known. 

»  TV)  +  Jfs  (<T  t Tv)  *  ARE  Known  AS  The  ® 

"NATURAL  TeLSRANCe  LIMITS  "  -  AN 

unfortunate  Choice  of  Terminolo6</. 

<m  +  Jk  C  )'*1  L'vn-  >v0  is  k/wewN| 

AS  The  UPPER  (Lower)  Tolerance  Limit.  ! 
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Thus  To  Su mmariec  me  Have,  As  A  model 
For  o0  s  Thought  Processes  : 

<«>  B*  {  a-e  <  D  * 

i 

For  The  Design  Part  and 


(l>  P»{  <vn- JNct  +  t1)  f  Of 

;  •“  P>  FoA  The  MANUFACTURING)  PART. 

Vme  Propose  The  Following  As  (Pi.au si euF.) 


Think.  Pieces  '. 
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•  A  Design  is  a  Priori  Good  if  dr*n 
And  J* ( t  1 ) /l  -  £  when  e(sp. 

•  A  Design  is  A  Priori  Tolerance  optimal  if 

d.*  *m  AnD  when  d*  8. 
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Posterior  Analysis. 


Suppose  That  JB  has  Had  An  oppor¬ 


tunity  To  observe  The  MANUFACTURING^ 


PROCESS,  And  AS  A  RBSQLT  OBTAIN  AS  OATAi 

d=  (  w««e  d-l  .s 

A  Realization  of  c 0.  i 

lst  Z<£:ln. 

Then,  The  Paeoict/ve  Di srRieur/oN 
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And  IT  1$  ft  Po$T££lo£l  Ti.Sfi  fines 
QPT/MfiL  IF  Tn£  INGQUfiLlTteS  A &av£ 

AA£  PsPtficsD  By  Snufiimss, 
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LJUTLiAie  OF  A  KAT/OA/AL  BAS/S  FaA  Fno^s/ajc, 

Alt  EH  MAT /UE  Tz<HkihiAcit<  fie  Q./Q. 
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MOTIVATION 


1.  Emphasis  on  Deterministic  Optimization 

2.  Robust  Design 

—  Variations  in  Loading, Dimensions,  Material 

*  important  while  optimizing 

--  Nonlinear  Functions 

*  finite  element  modeling 

*  power  transmissions;  dynamic  systems 
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Algorithm  for  the  sub-problem  fo 


minimize  \j2dF  d 

subject  to  ,g(d)  =  0 

;ep  1  Choose  initial  design 

Set  k=Q 

ep  2  Solve  QP 

Minimize  l/2prp  +  dk  p 
Subject  to  SLgkP  +  9k  =  0 
Let  pk  be  the  solution, 
ep  3  Set  dk+ 1  =  dk  +  <*P^ 
ep  4  If  \g{dk)\  <  TOL ,  and  ||pfc|[  <  TOL  , 
Then  set  A  —  (J2?=n  ,  an^  stoP- 

Otherwise,  set  k=k4T  and  goto  step  2. 
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A  PROBABILISTIC  OPTIMAL  DESIGN  PROBLEM  WITH  H-L  INDEX 
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FIRST-  AND  SECOND-  ORDER  DERIVATIVES  IN  STRUCTURAL  RESPONSE 


Finite  Element  Model 
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Direct  Method 


—  First-Order 
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Continuum  Model 
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Shape  Variations  —  stochastic  boundary 
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variation  irt  taken  aa  St  •  0.  •  O3  *  0.2.  The 
rtiuuutr  Index  St  •  I  U  taken  aa  saae  for  all 
constraints.  The  aacarlal  weijnt  density  Is  0.1 
lb/la-3,  rating's  modulus  E  •  10'  psl,  allowable 
atreasaa  are  »t*  -  «jJ  •  5000  psl.  *2a  *  20,000  psl, 
displacement  Halts  are  •  tja  -  0.005  In.,  lower 
bound  la  natural  frequency  equals  2178  Hs.  and  a  lower 
at  0.05  In2  Is  lapoaed  on  eaen  design  variable. 
The  starting  design  Is  r  •  (10.0,  5.0,  5.0)  Ini.  Many 
other  starting  designs  were  used  and  resulted  in  Che 
saae  solutions. 

Table  2  contains  results  for  various  values  at  I. 


previous  remark,  we  can  also  conclude  that  for  a 
fixed  1-1,  there  Is  a  sudden  change  In  the  governing 
failure  Bodes  when  3  Is  Increased  Cram  0.0*  to  O.Oi. 
This  also  substantlatss  the  fact  that  considering  only 
the  failure  sodes  based  on  a  deterministic  design  ta 
generally  unsafet  unexpected  failure  nodes  can  play  s 
role  in  probabilistic  design. 

Thirdly,  It  Is  Interesting  ta  conduct  parametric 
studies  related  to  S.  Specifically,  changes  In 
sptlsua  cost  and  cnanges  m  the  sensitivity  values 
with  respect  to  6  aay  provide  the  designer  with 
additional  Insights. 

Tension— Toareresston  Soring  (fix.  5) 
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It  say  be  noted  that  the  deterministic  solution  Is 
obtained  by  solving  problem  (28)  and  ccrreo  ponds .  In 
effect,  to  a  value  of  I  -  0.  The  design  sensitivity 
analysis  expressions  derived  In  this  pacer  have  been 
verified  using  a  divided-difference  scheme  based  on 


(38)  dg/dbt  -  (g(»r. 

g(vl. 


vki)/c 


*k>  * 


*>e^m  c  Is  s  seall  number.  The  sensitivity  of  the 
•etlve  second-moment  criteria,  d/dv  (l  -  dj/S)  for  l  * 
I.  are  printed  out  In  Table  2.  The  results  In  Tabls  2 
provide  the  designer  with  a  choice  of  designs, 
depending  on  the  desired  reliability  level. 

Firstly,  note  that  no  solution  exists  for  1  £  1 . 
Since  the  coefficient  of  vartaclon  3  is  saee  far  each 
variable,  the  results  only  depend  on  the  product  36. 
Thus,  the  absence  of  a  solution  for  3  *  3.2  and  ]  1 

else  holds  :r-r«  Tar  a-O.l  and  !  >  1.  Thus,  for  7 
laectflM  value  of  3,  the  aaxleue  reliability  of  the 
Structure  13  known. 

3*raralT,  note  the  sudden  change  In  the  active 
set  Men  going  from  )  -  0.2  ta  ]  -  3.J.  Eased  on  the 


The  problem  Cl8]  Is  to  minimise  the  (expected) 
weight  of  the  spring  subject  to  constraints  on  minimum 
deflection,  snear  stress,  surge  frequency,  limits  on 
outside  diameter  and  on  design  variables.  The  wire 
diameter  d,  coll  diameter,  0  and  number  of  colls  n 
have  been  considered  as  random  variables,  and  their 
mean  values  are  considered  as  design  variables.  After 
accounting  for  the  Input  data,  the  deterministic 
problem  can  be  expressed  as 


Minimus  (n  »  2)  Da2 
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(39) 


t  -  Ii%2L2  <  0 

0  a 


1  <  0 


1 .3 

d  >_  0.05 
0  >  0.05 

•  n  >  1 

The  POO  problem  corresponding  to  (39)  Is  In  the 
form  (37)’,  with  s  -  k  -  3.  The  results  of  the 

POD  problem,  for  various  values  of  5.  are  -riven  In 
Table  3.  For  this  problem  S4  -  0.005.  Og  0.05,  0^ 
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TABLE  2 

RESULTS  FOR  2~3AR  TRUSS 


_a-=.  (O'  I.;  Q.  t,  0't.) 


8 

FINAL 

COST 

FINAL 

DESIGN 

ACTIVE 

SET 

SENSITIVITY  OF 
CONSTRAINTS 

ACTIVE 

0 

(deterministic) 

20.542 

8.91 

1.93 

4.25 

2,3 

— 

0.05 

20.768 

9.30 

2.21 

3.82 

2,8 

-10.4 

-3.2 

4.1 

-27.1 

-5.4 

-19.3 

0.1 

20.919 

9.35 

2.18 

3.90 

2,8 

-5.1 

-1.6 

2.0 

-13.4 

-2.6 

-9.4 

0.2 

21 .211 

9.33 

2.03 

4.23 

2,&__ 

-2.5 

-0.9 

0.9 

-6.6 

-1.1 

-4.4 

0.3 

21  .855 

9.80 

2.38 

3.98 

1.3 
-1 .6 

-0.7 

0.6 

-2.8 

-0.9 

0.4 

22.522 

9.98 

2.54 

4.14 

1,2 

1.0 
-1 .1 

-0.5 

0.4 

-2.0 

-0.6 

0.5 

23.153 

10.16 

2.63 

4.32 

1,2 

0.7 

-0.9 

-0.4 

0.3 

-1  .5 
-0.5 

0.6 

23.924 

10.37 

2.87 

4.52 

1,2 

0.6 

-0.7 

-0.3 

0.2 

-1 .2 
-0.4 

0.7 

24.272 

10.47 

2.79 

4.73 

1,2 

0.5 

-0.6 

-0.25 

0.17 

-1  .0 
-0.3 

0.8 

24.938 

10.65 

2.93 

4.91 

1,’ 

0.4 

-0.5 

-0.2 

0.15 

-0.3 

-0.25 

,'<QTZ :  3i  »  ilj  *  ^3  *  0.2 

Active  constraint  no. 

*  *  „ 

It  It  ft 

1  -  natural  frequency  . 

2  »  horizontal  displacement, 

8  -  vertical  " 

load  case 

It  IT 

2 
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RESULTS 

TABLE  3 

FOR  TENSION-COMPRESSION 

-TU  -  Oooi“ 

SPRING 

8 

FINAL 

COST 

X  0.001 

FINAL 

DECISION 

SENSITIVITY  OF  ACTIVE 
CONSTRAINTS 

0 

(deterministic) 

154.0 

(.064, .750,2.945) 

— 

— 

0.05 

156.0 

(.065,. 752, 2. 971) 

6313.0 

-19241.0 

-439.0 

507.0 

-37.0 

0. 

0.10 

153.0 

(.065, .757,2.971) 

3389.0 

-9608.0 

-218.0 

'252.2 

-13.5 

0. 

0.2 

162.0 

(.065, .763,2.973) 

1677.0 

-4791 .0 

-107.0 

125.0 

-9.2 

0. 

0.4 

171.0 

(.066, .789,2.975) 

821  .0 
-2283.0 

-51 .4 

61 .3 

-4.5 

0. 

rs  - 

^ 

130.0 

(.067, .812,2.977) 

526.0 
-1580. 0 

-33.0 

40.0 

-3.0 

0. 

0.3 

189.0 

(.063, .834,2.980) 

393.5 

-1179.0 

-23.9 

29.5 

-2.3 

0. 

1.0 

200.0 

(.063, .858, 2. 982) 

308.0 

-938.0 

-18.4 

23.1 

-1.3 

0. 

1.2 

21 1 .0 

(.069,-883,2.985) 

251  .3 
-777.0 

-14.7 

18.9 

-1.5 

0. 

1.4 

222.0 

(.070, .908,2.983) 

21  1 .0 
-662.6 

-12.2 

15.9 

-1.23 

0. 

1.55 

237.0 

(.071  , .942,2.952) 

174.0 

-9.3 

-1.0 

-553.0  13-1  0. 


DESIGN  OPTIMIZATION  WITH 
INNER  AND  OUTER  NOISE 


By 


Eric  Sandgren 

School  of  Mechanical  Engineering 
Purdue  University 


OBSERVATION: 


—  We  are  not  solving  the  design  optimization 
problem  incorrectly. 

—  We  are  solving  the  wrong  design  optimization 
problem 
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PROBLEM  WITH  TRADITIONAL  METHODS: 


—  Design  is  a  fuzzy  endeavor. 

—  Optimization  requires  a  precise  mathematical 
representation. 


UNCERTAINTIES  IN  DESIGN: 


—  Specifications 

—  Material  properties 

—  Loading  conditions 

—  Service  environment 

—  Manufacturing  process 


—  Idealized  model 


INNER  NOISE:  Controllable  variations  in  fhe 
design  variables  caused  by  time  and  manufacturing 
(wear  and  tolerances) 


OUTER  NOISE:  Uncontrollable  variations  in 
design  parameters  (temperature,  humidity) 


h 


GENERAL  MATHEMATICAL  PROGRAMMING  PROBLEM 

MINIMIZE  F(x)  ;x  -  [x1#  x2 . x^*,  x€RN 

SUBJECT  TO 

GkCx)>0,  k  -  1,2,3 . K 

Hj  Cx)  ■  0,  j  *  1,2,3,....  , J 
a^x^iSb^  i  ■  1,2,3,. ...N 
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THE  REAL  DESIGN  OPTIMIZATION 
PROBLEM 


DESIGN  QUESTIONS: 


—  Why  is  50,000  PSI  acceptable  and  not  50,001? 

—  Is  E  =  30.0  E-f06  or  is  it  really  30.1-f06? 

—  Is  Fx  =  10,000  lb  or  is  Fx  10,100  lb? 


METHODS  FOR  DEALING  WITH 
UNCERTAINTY 


—  Monte  Carlo  Simulation 

—  Stochastic  Programming 

—  Fuzzy  Optimization 

—  Design  for  Latitude 

—  Game  Theory  (Multiple  Objectives) 


ySyiirfflii1  ^ 


PREPROCESSOR 
(user  defines  problem) 


THE  REAL  DESIGN  PROBLEM: 
OBJECTIVES  (GOALS) 

—  Minimize  weight 

—  Maximize  natural  frequency 

—  Minimize  chance  of  buckling 

—  Minimize  maximum  stress 


, Vs  .'1  -T£V ■  >  •'>■.1 
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The  Nonlinear  Programming  Formulation 


Minimize  Volume  =  2  V2 


/Ai> 

:x  +  x2  x  =  A 


subject  to 


gx(x)  =  20,000  -  jffjj  >  0 


g2(x)  =  20,000  —  j<72j  >  0 


g3(x)  =  20,000  -  Ja3j  >  0 


and 


.01  <  xj  <  5.0  i=l,2 


$ 


WX  ux »*>  - 


Variations  Considered  in  Truss  Design 

—  Change  in  load  from  15,000  lb  to  20,000  lb 

—  Change  in  direction  of  load  from  30  to  60 
degrees 


Point  Volume 


1.  Optimal  Design  for  3  Bar  Truss  with 
Variations  in  Load  Magnitude  and 
Direction. 


Point  Volume 


Goals  for  Nonlinear  Goal  Programming 


—  Minimize  Volume:  Volume  -f  dx  —  d^  =  0 


—  Minimize 


Max 


Stress: 


Max 


Stress 


*+*  do  —  d«+  —  0 


Minimize  change  in  stress  due  to  design 
change. 
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OBSERVATION: 

If  we  can  precisely  define  the  uncertainties  for 
our  design  then  they  are  not  really  uncertain. 


TOLERANCE  TOLERANCE 

BAND  /REGION 
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STATISTICAL  APPROACHES  TO 
1C  DESIGN  AND  FABRICATION 
PROCESS  CONTROL 

Andrzej  J.  Strojwas 


Carnegie  Mellon  University 
Pittsburgh,  PA  15213 

UIED-88 
May  11,  1988 


PRESENTED  APPROACH: 

EIGHT  YEAR  RESEARCH  PROGRAM  AT  CMU 


S.  W.  DIRECTOR 
W.  MALY 
S.  R.  NASSIF 
C.  SPANOS 
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OUTLINE 

Stochastic  nature  of  VLSI  fabrication 


Statistical  simulation 


Tuning  -  parameter  identification 


Worst-case  analysis/design 


Yield  maximization 


s 


Statistical  quality  control 


VLSI  DESIGN  AND  MANUFACTURING 


Goal 


minimize  cost  per  chip 


Short  design  cycle 


Fast  turn-around 


High  manufacturing  yield 


VLSI  DESIGN  AND  MANUFACTURING 
Solutions: 

•  Design  Automation 

o  system,  circuit  and  process  levels 

•  Yield  Maximization 

o  parametric  and  catastrophic 

•  Active  Process  Control 

o  monitoring,  diagnosis,  quality 
and  adaptive  control 


Implantation 


Resistivity 
Measure.  . 


(Field  Oxide) 


Oxide 

Thicknes 

Evaluation, 


human  error  and  equipment  failures 

fluctuations  In  process  conditions 
e.g.,  turbulent  gas  flow 

fluctuations  In  materials 

e.g.,  impurities  in  chemfcais 

variations  In  substrate 

e.g.,  point  defects,  dislocations 
surface  imperfections 

lithographic  spots  (during  mask  fab  and  use) 

e.g.,  transparent  spots  in  opaque  regions 
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Structural  faults: 


Changes  in  circuit  topology 
(e.g.,  shorts  or  opens) 


-  may  depend  on  bias 


•  Hard  performance  faults 

-  1C  doesn't  function  properly 
(e.g.,  some  state  transitions  do  not  occur) 


•  Soft  performance  faults 

-  1C  functions  but  response  (e.g., 
speed  or  power)  falls  outside  allowable  limits 


Primary  Dependence  of  f-dults  on  Disturbances 


Simulation  of  the  Fabrication  Process 


PROCESS  DISTURBANCES 


•  Fluctuations  in  process  controls 
(e.g.  temperature  variations) 
account  for  small  portion  of 
variations  in  device  performance 

•  Physical  disturbances  (e.g.  diffusion 
coefficient,  oxide  growth  rate) 
crucial  to  model  device  parameter 
fluctuations  realistically 


Generation  of  Process  Disturbances 


Prometheus  -  A  Tuning  Tool  for  FABRICS 
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Simplified  Hierarchy 


N  ( 0,  (ak)  ) ,  accounts  for  chip  variation 


Worst  Case  Analysis 


Defn:  The  set  of  parameters  which  cause  the 
performance  to  vary  in  the  same  direction 

y=f(x  i ,  x  2  ) 


Worst  Cases: 


Traditional  Worst  Case  Analysis 


’arameter  Based: 


Device  _ 
Parameters 

(V,h .  B) 


Circuit 

Simulator 


Performance 


Ignores  correlation  between  parameters 

A  set  of  physical  parameters  that  produces 
the  worst  case  may  not  exist 

Results  very  pessimistic 


Device  Based: 


Device  _ 
Type 

(fast-fast, 

slow-slow) 


Circuit 

Simulator 


Performance 


Ignores  correlation  between  devices 
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New  Worst  Case 


•  Process  Disturbances  are  uncorrelated 

•  All  device  parameters  and  devices  correctly 
correlated 


Example:  RAM 

The  methodology  was  applied  to  a  3-transistor  cell  RAM. 
The  circuit  simulated  consisted  of  the  sense-amplifier,  the 
storage  cell,  the  bit-line  preeharge  logic ,  and  the 
read/write  logic: 

Precharae 
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Example:  RAM  coot. 

Performances  considered  were: 

•  P,  Ipeaj£:  power  dissipation  and  peak  current 

•  fread,  rw rite:  read  and  write  times 

Performances  were  most  sensitive  to: 

•  Ln,  line  width  variation  of  nitride  layer 

•  Lp,  line  width  variation  of  poly  layer 

•  Db,  diffusivity  of  Boron 

•  D^,  diffusivity  of  Arsenic 

•  Rox,  oxide  growth  rate 

•  ^sub’  substrate  concentration 
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Example:  RAM  cont. 
Worst  case  performances,  at  p  +/-  1.5o*: 


performance 


nominal  worst  case  worst  case 


1.45  mW 

2.28  mW 

1.17  mW 

1.1  mA 

1.8  mA 

0.85  mA 

16.9  ns 

13.6  ns 

20.8  ns 

23.8  ns 

15.7  ns 

28.6  ns 

Comparison  to 
Traditional  Worst  Case 

Performed  worst  case  analysis  with  respect  to  device 
parameters. 

Sensitivities  were  calculated  by  perturbation. 

Performances  were  most  sensitive  to: 

•  Depletion  device:  Vth,  KP,  and  7 

•  Enhancement  device:  Vth.  KP,  and  7 

Worst  case  was  simulated  with  device  parameters 
perturbed  by  la  from  their  mean  values. 


Comparison,  cont. 


Worst  coses  were  found  at  la 

performance  nominal  worst  case 

Disturbances 

Power 


worst  case 
Devices 


1.45  mW 


1.77  mW 


2.30  mW 


Process  Optimization 


Process  Yield  Maximization  Problem 


Cell  Optimization 


Multilevel  Optimization 


Designable  Parameters: 

Local  -  layout  of  individual  cells 

Global  -  fabrication  process  controls 
common  to  ail  cells 

Yield  Maximization  Problem: 

Given  the  JPDF  of  Process  Disturbances 
max  YIELD 
P,L 

subject  to  box  constraints  on: 

P  (process  control  capabilities) 

L  (lithography  capabilities  &  chip  area) 


Minimize  Power-Dela 


•  Gate  Oxidation  Time,  Tox 

•  Depletion  Threshold  Dose,  QD 

•  Minimum  Dimension,  m 

•  Ratio  of  Puiiup/Pulidown,  R 


Performances 

1.  Power  Dissipation,  P 

2.  Rise  time,  rRl  (10%  to  90%  of  v^) 

3.  Fall  time,  rF 

Test  Conditions 

Loaded  with  0.01  pF,  -  1  gate  load. 


Optimization 


Problem:  min  /i(P)/x(rr) 
X 


box  constraints  on  designate  paramet 


M 


75 

Tox 

<1.5 

75 

i 

Qd 

£  1.5 

75 

i 

m 

75 

i 

R 

<2.0 

nstraints  on  performances: 
minicell  area:  m2(4  +  R)  £  24 
power:  fi(P)  +  cr(P)  £  1 

delay:  a(rr)/ MTr)  <  0*1 


ikition; 


Tox  =  1*17  0D  =  °*96  m  =  °-75 


av./v. 


R  =  0.873 


PROCESS  OBSERVABILITY 

Measurement  types 

-  in-line  measurements 

-  test  structure  measurements 

-  probe  measurements 

In-line  tests  (CD,  layer  thickness,  sheet  resistance) 
Scribe  lane  test  structures  (performance  evaluation) 
Probe  measurements  (until  first  fail) 

Needi  for  process  control: 

-  increase  number  of  in-line  measurements 

-  establish  relationships  between  in-line 
distributions  and  yield 

-  determine  selection  thresholds  and  quality 
control  procedures 
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STAT.  QUALITY  CONTROL 


1.  Continue  processing  if  predicted  yield  > 
Threshold  of  acceptability 


2.  Corrective  measures  -+Feed  forward  control 


3.  Rework 


4.  Reject  the  lot  if  predicted  yield  <  Threshold  of 
rejection 

(i.e.  further  processing  is  not  cost  effective) 
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STATISTICAL  QUALITY 
CONTROL  FLOW 
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PROBLEM  DECOMPOSITION 


•  Shared  information  between  observed  in-lines 
-►correlation 


•  Grouping  in-lines  so  that  no  shared  information 
within  groups  -►clustering 


•  In-lines  in  a  cluster  depend  on  restricted  set  of 
process  parameters  and  disturbances 


•  Minimal  set  of  in-lines  within  each  cluster  that 
needs  to  be  observed  -►factorization  -►principal 
components 


•  Identification  of  factor  in-lines 


Sd< 
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REGRESSION 


4  INPUTS 


LAYER  1 
4  VARS. 


LAYER  2 
6  VARS. 


LAYER  3 
FINAL  LAYER 
1  OUT  OF  15 
CHOSEN 


Z11 


Z24 


Y=Z31  (BEST  FIT) 


INTERMEDIATE  VARIABLE  REJECTED 
INTERMEDIATE  VARIABLE  CHOSEN  FOR  NEXT  LAYER 

•  Level  1:  P,  D  -♦Factor  in-lines 

•  Level  2:  Factor  in-lines,  P,  D  -*X  and  Non¬ 
factor  in-lines 


PROCESS  MODELING 


EXAMPLE  DECOMPOSITION 


•  4  device  NMOS  process 

•  Data  generated  by  FABRICS 


•  Clustering  with  threshold  of  0.1:  10  distinct 
clusters 


•  Principal  component  decomposition:  a  few 
factor  in-lines 

•  3  CMOS  processes  from  TI  -►tuned  to 
FABRICS  -►to  be  used  for  further  study 
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2-LAYERED  DEVICE  MODELS 


(DEVICE 

PARAMETER) 


(DEVICE 

PARAMETER) 


TEMPSRC- 

DRIVE 

(PROCESS) 


TEMPGATE- 

0X2 

(PROCESS) 


ETRESH- 

ENERGY 

(PROCESS) 


LINEARDRY 

(DISTURBANCE) 


P02GATE2 

(PROCESS) 


TEMPPOLY¬ 

DOPING 

(PROCESS) 


QUALITY  CONTROL  FLOW 


SIMPLICIAL  APPROXIMATION 


l 


•  Acceptability  region  specified  in  terms  of  circuit 
performances  -♦constraints 


•  Joint  distribution  in  terms  of  in-lines 
-♦estimated  during  process 


•  Map  back  acceptability  region  to  in-lines 
-♦Simplicial  approximation 
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SIMPLICIAL  APPROXIMATION 


QUALITY  CONTROL  DECISIONS 


•  Determine  tolerances  on  distributions  -*use  of 
process  trajectory 

o  Acceptance 

Statistical  distances  to  determine  how  far 
process  has  deviated 

o  Rejection 

Moments  of  distributions  of  single  in-lines 
Partial  yields 


QUALITY  CONTROL  contd. 
ACCEPTANCE  CRITERION 

•  Distance  between  jpdfs  using  nonparametric 
techniques 


•  Yield  sensitivity  to  magnitude  and  direction  of 
shift 


•  Change  in  yield  due  to  the  shifted  mean 
(e.g.  weighted  Mahalanobis  distance) 


Accept  if  change  is  small 


QUALITY  CONTROL  contd. 
REJECTION  CRITERION 


Dimension  reduction  by  factorization  (quasi¬ 
independence) 

Simplified  approximation  of  acceptability 
region  by  hyperbox 

Appropriate  coordinate  transforms 

Rejection  based  on  single  in-line  distribution 
-►partial  yields  -♦tolerances  on  in-line 
distribution 


Acceptability 


V2 


YIELD  PREDICTION 


Integrating  jpdf  over  acceptability  region  in  in¬ 
line  or  circuit  performance  space 


/CD  00 

...  I  *(U)%(U)d  ,.du 

*  I 


Coupling  -darge  computing  time  for  integration 


Low  dimensionality  for  control 


ACCEPTABILITY 

REGION 


SYSTEM  OVERVIEW 

Data  generation  from  device 
and  circuit  simulators 

BUILDREG 
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EXAMPLE 


Parameter:  NE — CGBO,  Acceptance  level:  85% 


Observables:  thsio2gate,  xjf4 


Acceptance: 

Partial  yield  due  to  xjf4  =  99% 

Partial  yield  due  to  thsio2gate2  =  93.8% 

The  Mahalanobis  distance  between  the  two 
distributions  =  -6.3005e-03 
The  estimated  yield  is  =  93.0% 

Rejection: 

Partial  yield  due  to  xjf4  =  99% 

Partial  yield  due  to  thsio2gate2  =  77% 

The  Mahalanobis  distance  between  the  two 
distributions  =  9.7272e-02 
The  estimated  yield  is  =  79.5% 


EXAMPLE 


Parameters:  NE — VTO  and  ND — KP, 

Acceptance  level:  75% 

Observables:  thsio2gate,  xje2,  tempsrcdrive 

Partial  yield  due  to  thsio2gate2  =  79% 

Partial  yield  due  to  xje2  =  98% 

The  Mahalanobis  distance  between  the  two 

distributions  =  2.66e-2 

The  estimated  yield  is  =  71% 


CURRENT  WORK 

I  ! 

Statistical  simulation 

-  sampling  techniques 

-  analytical  mapping  of  jpdf's 

Yield  prediction 

-  dimensionality  reduction 
techniques 

Statistical  design 

-  inverse  mapping  techniques 

-  factor-splitting  aporoach 


®  Statistical  process  control 

i  -  nonparametric  methods 


Statistical  Optimization  for  Computational  Models 

Kishore  Singhal 

AT&T  Bell  Laboratories 
1247  South  Cedar  Crest  Blvd. 

Allentown,  PA  18103 


Realistic  engineering  problems  of  expected  product  cost  minimization  and  reliability 
maximization  in  the  presence  of  manufacturing  fluctuations  and  parameter  variations 
due  to  environmental  and  age  related  effects  can  be  formulated  mathematically  as 
constrained  statistical  optimization  problems.  Parametric  Sampling  is  a  particular 
technique  for  solving  such  problems  when  the  system  performance  can  be  obtained 
through  simulation  using  computational  models. 

A  database  containing  the  results  from  a  small  number  of  simulations  is  first  created. 
Parametric  Sampling  allows  us  to  estimate  the  objective  function,  the  constraints  and 
their  gradients  not  only  at  the  initial  set  of  design  parameters  but  also  at  new  design 
points  generated  by  the  optimization  algorithm.  As  needed,  additional  sample  points 
are  added  to  the  database  to  ensure  estimation  accuracy. 

Sensitivity  studies  to  determine  the  influence  of  specification  changes  and  departures 
from  assumed  statistical  distributions  are  possible  with  minimal  computational  cost. 
Experience  with  electrical  and  mechanical  systems  shows  that  substantial  improvement 
in  the  objective  functions  is  possible. 


Conference  on  Uncertainty  in 
Engineering  Design,  1988  (  io,  u 


Ref:  Statistical  Design  Centering  and  Tolerancing  Using  Parametric  Sampling.  IEEE 
Trans.  Circuits  and  Systems,  July  1981. 
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Computer  Simulation  vs 
Physical  Experimentation 
+  Often  faster  (real  time) 

+  Less  expensive 

+  Easy  to  explore  areas  of  operation  where  physical 
experimentation  may  be  difficult 

-  Need  a  computer  model 

-  Need  reasonable  characterization  of  probability 
model 

Key  difference  in  Optimization  Strategy 

•  We  can  distort  reality  to  assist  us 

•  Analysis  and  Decision  steps  can  be  easily 
interlaced 

•  Computational  complexity  is  no  longer  an  issue 
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Design  for  Manufacturability,  Reliability 
and  Minimum  Product  Life  Cycle  Cost 


Include 

•  Process  variation 

•  Tolerance  vs  cost  tradeoffs 

•  Environmental  condition  variations 

•  Limits  on  performance  degradation  with  age 

•  Testing  and  field  repair  costs 

•  Cost  of  lost  goodwill 

•  etc  etc  etc 


in  the  design  phase  itself 


Simulator  Components 
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Statistical  Simulator  Components 


Distribution  /(x;0) 

nominals 
0 =  •  tolerances 
correlations 


Random 

variables 


Noise 

Temperature 

Humidity 


Simple  Monte  Carlo 


Decision 

moQ?fy°0 


Variance  Reduction  by  Importance  Sampling 
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Ordinary  Monte  Carlo 

Evaluate  high  dimensional  integrals  of  the  form 

E(v )  =  /  v(x)f(x;0)dx 

—  oo 

where  v  Is  a  function  computed  through  simulation 
and  f(x\0)  is  a  density  with  parameters  0  by 
sampling  as 

1  N 

*  =  77  E  v(xi) 

1  i- 1 

where  x:-  are  samples  drawn  from  the  distribution 


Importance  Sampling 

E(v)  =  /  v(x)f(x]0)dx 


—  OO 


can  be  written  as 


E(y)  =  /  v(x)-^~^-h{x‘,eh)dx 


where  h(x\Qh)  is  some  other  density 


The  second  integral  can  be  approximately  evaluated 
by  sampling  as 

,  i  a  ,  .  f(.x>.e ) 


i  n  ,  s  nw) 

v  —  —  V  v(x,  j - 

nE  ^  l)  h{Xi  A) 


where  x,-  are  samples  drawn  from  the  distribution 
h(x\9h)  selected  to  reduce  the  variance  of  v 
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Parametric  Sampling 

*  Z  .  ,  /(*.;*) 

*  ■  «,S'WM^A) 


where  xt-  are  samples  drawn  from  the  distribution 
h(x\0h)  independent  of  9  !! 

•  Functional  form  of  /(x,;#)  Is  known  and  v  can 
be  evaluated  for  any  9  without  performing  a  new 
Monte  Carlo 

•  First  and  higher  order  derivatives  of  v  are  easily 
computed  allowing  the  use  of  powerful 
optimization  tools 

Parametric  Sampling  thus  changes  a  statistical 
optimization  problem  into  a  standard  deterministic 
problem 


Some  Details 
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Stochastic  Approximation  principles  used  to  force 
convergence  in  probability 


The  multivariate  normal  sampling  density  is 
related  to  the  Hessian  information  obtained  from 
the  optimizer  and  enables  navigation  along 
narrow  ridges 


Quasi  random  numbers  are  used  to  reduce 
estimator  variance 


•  Sample  pooling  to  reduce  sample  size 


•  Ratio  estimators  improve  accuracy 


Jacknife  for  bias  reduction  and  error  estimation 


•  Sensitivity  to  parameter  distributions  easily 
computed 


•  Re-design  following  specification  changes  is 
simple 
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USING 
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OVERVIEW 


OBJECTIVE:  DISCUSS  THESE  PAPERS  IN 

LIGHT  OF  SIGNIFICANT 
DESIGN  FOR  QUALITY 
EFFORTS  USING  PHYSICAL 
EXPERIMENTATION 

•  LOSS  FUNCTIONS  AND  ADJUSTMENT 
PARAMETERS 

•  EXPERIMENTAL  DESIGNS 

•  NOISE 

•  OPTIMIZATION 
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DESIGN  FOR  QUALITY  OBJECTIVE 

USE  NOMINAL  DESIGN  VALUES  TO 
MINIMIZE  THE  INFLUENCE  OF  NOISE 
ON  THE  PERFORMANCE  OF  THE  DESIGN 
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•  STATISTICAL  DESIGN  CENTERING 
COMPUTER  EXPERIMENTS 

•  ROBUST  DESIGN 
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PHYSICAL  EXPERIMENTS 


PHYSICAL  VS.  COMPUTER  EXPERIMENTATION 
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LOSS  FUNCTIONS  AND  ADJUSTMENT  PARAMETERS 


SQUARED  ERROR"  VS  0-1  LOSS 

—  ECONOMIC  ADVANTAGES  FOR  BEING 
CLOSE  TO  TARGET 

—  MAY  REQUIRE  MULTICRITERIA 
OPTIMIZATION 

ADJUSTMENT  PARAMETERS 

—  HOW  TO  IDENTIFY 

—  HOW  TO  CHOOSE  A  PERFORMANCE 
MEASURE 


STATISTICAL  DESIGN  OPTIMIZATION 
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BASIC  MONTE -CARLO  METHOD 


modified  monte-carlo  method 
global  approximation  of  the  objective  function 
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